首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 15 毫秒
1.
Admissible clustering procedures   总被引:2,自引:0,他引:2  
  相似文献   

2.
3.
Clustering of microarray gene expression data is performed routinely, for genes as well as for samples. Clustering of genes can exhibit functional relationships between genes; clustering of samples on the other hand is important for finding e.g. disease subtypes, relevant patient groups for stratification or related treatments. Usually this is done by first filtering the genes for high-variance under the assumption that they carry most of the information needed for separating different sample groups. If this assumption is violated, important groupings in the data might be lost. Furthermore, classical clustering methods do not facilitate the biological interpretation of the results. Therefore, we propose to methodologically integrate the clustering algorithm with prior biological information. This is different from other approaches as knowledge about classes of genes can be directly used to ease the interpretation of the results and possibly boost clustering performance. Our approach computes dendrograms that resemble decision trees with gene classes used to split the data at each node which can help to find biologically meaningful differences between the sample groups. We have tested the proposed method both on simulated and real data and conclude its usefulness as a complementary method, especially when assumptions of few differentially expressed genes along with an informative mapping of genes to different classes are met.  相似文献   

4.
5.
Tseng GC  Wong WH 《Biometrics》2005,61(1):10-16
In this article, we propose a method for clustering that produces tight and stable clusters without forcing all points into clusters. The methodology is general but was initially motivated from cluster analysis of microarray experiments. Most current algorithms aim to assign all genes into clusters. For many biological studies, however, we are mainly interested in identifying the most informative, tight, and stable clusters of sizes, say, 20-60 genes for further investigation. We want to avoid the contamination of tightly regulated expression patterns of biologically relevant genes due to other genes whose expressions are only loosely compatible with these patterns. "Tight clustering" has been developed specifically to address this problem. It applies K-means clustering as an intermediate clustering engine. Early truncation of a hierarchical clustering tree is used to overcome the local minimum problem in K-means clustering. The tightest and most stable clusters are identified in a sequential manner through an analysis of the tendency of genes to be grouped together under repeated resampling. We validated this method in a simulated example and applied it to analyze a set of expression profiles in the study of embryonic stem cells.  相似文献   

6.
Hoff PD 《Biometrics》2005,61(4):1027-1036
This article develops a model-based approach to clustering multivariate binary data, in which the attributes that distinguish a cluster from the rest of the population may depend on the cluster being considered. The clustering approach is based on a multivariate Dirichlet process mixture model, which allows for the estimation of the number of clusters, the cluster memberships, and the cluster-specific parameters in a unified way. Such a clustering approach has applications in the analysis of genomic abnormality data, in which the development of different types of tumors may depend on the presence of certain abnormalities at subsets of locations along the genome. Additionally, such a mixture model provides a nonparametric estimation scheme for dependent sequences of binary data.  相似文献   

7.
Summary CLUSLA, a computer program for the clustering of very large phytosociological data sets is described. It is an elaboration of Janssen's (1975) simple procedure. The essence of the program is the creation of clusters, each starting with one relevé, as the relevés are entered in the program. Each new relevé that is sufficiently distinct from already existing clusters is considered a new cluster. The fusion criterion is the attainment of a certain level of (dis-) similarity between relevé and cluster. Bray and Curtis' dissimilarity measure with presence-absence data was used.The program, written in FORTRAN for an IBM 370–158 system, can deal with practically unlimited numbers of relevés, provided the product of the number of primary clusters and the number of species does not exceed 140.000. We adopted maxima of 100 and 1400 respectively.After the primary clustering round a reallocation is performed. Then a simple table is printed with information on the significance of occurrence of species in clusters according to a chi-square approach. The primary clusters can be treated again with a higher fusion threshold; or approached with more elaborate methods, in our case particularly the TABORD program.The program is demonstrated with a collection of 6072 relevés with 889 species of salt marsh vegetation from the Working-Group for Data-Processing.Contribution from the Working Group for Data-Processing in Phytosociology, International Society for Vegetation Science. Nomenclature follows the Trieste system, which will be published later.The authors are very grateful to Drs. Jan Janssen, Mike Dale, László Orlóci and Mike Austin for their comments on drafts of the program, and to Wil Kortekaas for her help in the interpretation of the tables.  相似文献   

8.
Summary Multivariate analysis of plant community data has three goals: summarization of redundancy, identification of outliers, and elueidation of relationships. The first two are handled conveniently by initial fast clustering, and the third by subsequent ordination and hierarchical clustering, and perhaps table arrangement.Initial clustering algorithms should achieve withincluster homogeneity and require minimal computer resources. However, algorithmic uniqueness and a hierarchy are not needed. Computing time should be proportional to the amount of data, with no higher dependencies on the number of samples. A method is presented here meeting these requirements, called composite clustering and implemented in a FORTRAN program called COMPCLUS. The computer time required for COMPCLUS clustering is on the order of the time required merely to read the data, regardless of the number of samples.Several large field data sets were analyzed effectively by using COMPCLUS to reduce redundancy and identify outliers, and then ordinating the resulting composite clusters by detrended correspondence analysis (DECORANA). Various clusterings of the same data set can be compared using a percent mutual matches (PMM) index, and a matrix of such values can be ordinated for simultaneous comparison of a number of clusterings.This paper benefited at many points from discussions with Mark O. Hill and Robert H. Whittaker. Mark Hill suggested condensed data storage. This work was done under a National Science Foundation grant to Robert Whittaker. I also appreciate technical assistance from Timothy F. Mason and Steven B. Singer.  相似文献   

9.
In the following work we discuss the application of image processing and pattern recognition to the field of quantitative phycology. We overview the area of image processing and review previously published literature pertaining to the image analysis of phycological images and, in particular, cyanobacterial image processing. We then discuss the main operations used to process images and quantify data contained within them. To demonstrate the utility of image processing to cyanobacteria classification, we present details of an image analysis system for automatically detecting and classifying several cyanobacterial taxa of Lake Biwa, Japan. Specifically, we initially target the genus Microcystis for detection and classification from among several species of Anabaena. We subsequently extend the system to classify a total of six cyanobacteria species. High-resolution microscope images containing a mix of the above species and other nontargeted objects are analyzed, and any detected objects are removed from the image for further analysis. Following image enhancement, we measure object properties and compare them to a previously compiled database of species characteristics. Classification of an object as belonging to a particular class membership (e.g., “Microcystis,”“A. smithii,”“Other,” etc.) is performed using parametric statistical methods. Leave-one-out classification results suggest a system error rate of approximately 3%. Received: September 6, 1999 / Accepted: February 6, 2000  相似文献   

10.
Birds are considered critical indicators of ecosystem condition. Automatic recording devices have emerged as a trending tool to assist field observations, contributing to biodiversity monitoring on large spatio-temporal scales. However, manually processing huge volumes of recordings is challenging. Consequently, there has been a growing interest in automatic bird vocalization recognition in recent years. Automatic bird vocalization recognition technology has advanced from classical pattern recognition to deep learning (DL), with significantly improved recognition performance. This paper reviews related works on DL-based automatic bird vocalization recognition technology in the last decade. In this review, we present the current state of research in the three key areas of pre-processing, feature extraction and recognition methods involved in automatic bird vocalization recognition. The related datasets, evaluation metrics and software are also summarized. Finally, existing challenges along with opportunities for future work are highlighted. We conclude that, while DL-based automatic bird vocalization recognition has made recent advances in specific species, more robust denoising approaches, larger public datasets, and stronger generalization capabilities of feature extraction and recognition are required to achieve reliable and general bird recognition in the wild. We expect that this review will serve as a firm foundation for new researchers working in the field of DL-based automatic bird vocalization recognition technologies, as well as become an insightful guide for computer science and ecology experts.  相似文献   

11.
Myoelectric control systems—A survey   总被引:1,自引:0,他引:1  
The development of an advanced human–machine interface has always been an interesting research topic in the field of rehabilitation, in which biomedical signals, such as myoelectric signals, have a key role to play. Myoelectric control is an advanced technique concerned with the detection, processing, classification, and application of myoelectric signals to control human-assisting robots or rehabilitation devices. This paper reviews recent research and development in pattern recognition- and non-pattern recognition-based myoelectric control, and presents state-of-the-art achievements in terms of their type, structure, and potential application. Directions for future research are also briefly outlined.  相似文献   

12.
Multiple kernel learning (MKL) is demonstrated to be flexible and effective in depicting heterogeneous data sources since MKL can introduce multiple kernels rather than a single fixed kernel into applications. However, MKL would get a high time and space complexity in contrast to single kernel learning, which is not expected in real-world applications. Meanwhile, it is known that the kernel mapping ways of MKL generally have two forms including implicit kernel mapping and empirical kernel mapping (EKM), where the latter is less attracted. In this paper, we focus on the MKL with the EKM, and propose a reduced multiple empirical kernel learning machine named RMEKLM for short. To the best of our knowledge, it is the first to reduce both time and space complexity of the MKL with EKM. Different from the existing MKL, the proposed RMEKLM adopts the Gauss Elimination technique to extract a set of feature vectors, which is validated that doing so does not lose much information of the original feature space. Then RMEKLM adopts the extracted feature vectors to span a reduced orthonormal subspace of the feature space, which is visualized in terms of the geometry structure. It can be demonstrated that the spanned subspace is isomorphic to the original feature space, which means that the dot product of two vectors in the original feature space is equal to that of the two corresponding vectors in the generated orthonormal subspace. More importantly, the proposed RMEKLM brings a simpler computation and meanwhile needs a less storage space, especially in the processing of testing. Finally, the experimental results show that RMEKLM owns a much efficient and effective performance in terms of both complexity and classification. The contributions of this paper can be given as follows: (1) by mapping the input space into an orthonormal subspace, the geometry of the generated subspace is visualized; (2) this paper first reduces both the time and space complexity of the EKM-based MKL; (3) this paper adopts the Gauss Elimination, one of the on-the-shelf techniques, to generate a basis of the original feature space, which is stable and efficient.  相似文献   

13.
Summary Pattern recognition techniques were applied to analytical data to distinguish abnormal from normal microbial fermentations usingBacillus amyloliquefaciens as a model system. Patterns of fermentation end products during growth ofB. amyloliquefaciens were obtained from HPLC analysis of broth samples. Data were also obtained from fermentations using other bacterial species, strains, and environmental conditions, and were compared with the model data set. The bacterial species cultured includedB. subtilus, B. licheniformis, andEscherichia coli. Environmental variables included acration and temperature. The chromatographic patterns were compared by using hierarchical cluster and principal component analysis to obtain a quantitative measure of their similarity and to establish the normal variability within a model data set. Statistical analysis of the data indicated that individual fermentations can be assigned to distinct clusters on the basis of their divergence from the model system. Altered environments and other species can be identified as outliers from the model set. These results show that pattern recognition analysis has direct applicability to monitoring fermentation processes.  相似文献   

14.
Classifying monoclonal antibodies, based on the similarity of their binding to the proteins (antigens) on the surface of blood cells, is essential for progress in immunology, hematology and clinical medicine. The collaborative efforts of researchers from many countries have led to the classification of thousands of antibodies into 247 clusters of differentiation (CD). Classification is based on flow cytometry and biochemical data. In preliminary classifications of antibodies based on flow cytometry data, the object requiring classification (an antibody) is described by a set of random samples from unknown densities of fluorescence intensity. An individual sample is collected in the experiment, where a population of cells of a certain type is stained by the identical fluorescently marked replicates of the antibody of interest. Samples are collected for multiple cell types. The classification problems of interest include identifying new CDs (class discovery or unsupervised learning) and assigning new antibodies to the known CD clusters (class prediction or supervised learning). These problems have attracted limited attention from statisticians. We recommend a novel approach to the classification process in which a computer algorithm suggests to the analyst the subset of the "most appropriate" classifications of an antibody in class prediction problems or the "most similar" pairs/ groups of antibodies in class discovery problems. The suggested algorithm speeds up the analysis of a flow cytometry data by a factor 10-20. This allows the analyst to focus on the interpretation of the automatically suggested preliminary classification solutions and on planning the subsequent biochemical experiments.  相似文献   

15.
Novel biomarkers, in combination with currently available clinical information, have been sought to improve clinical decision making in many branches of medicine, including screening, surveillance, and prognosis. Statistical methods are needed to integrate such diverse information to develop targeted interventions that balance benefit and harm. In the specific setting of disease detection, we propose novel approaches to construct a multiple-marker-based decision rule by directly optimizing a benefit function, while controlling harm at a maximally tolerable level. These new approaches include plug-in and direct-optimization-based algorithms, and they allow for the construction of both nonparametric and parametric rules. A study of asymptotic properties of the proposed estimators is provided. Simulation results demonstrate good clinical utilities for the resulting decision rules under various scenarios. The methods are applied to a biomarker study in prostate cancer surveillance.  相似文献   

16.
A dynamic treatment regime (DTR) is a sequence of decision rules that provide guidance on how to treat individuals based on their static and time-varying status. Existing observational data are often used to generate hypotheses about effective DTRs. A common challenge with observational data, however, is the need for analysts to consider “restrictions” on the treatment sequences. Such restrictions may be necessary for settings where (1) one or more treatment sequences that were offered to individuals when the data were collected are no longer considered viable in practice, (2) specific treatment sequences are no longer available, or (3) the scientific focus of the analysis concerns a specific type of treatment sequences (eg, “stepped-up” treatments). To address this challenge, we propose a restricted tree–based reinforcement learning (RT-RL) method that searches for an interpretable DTR with the maximum expected outcome, given a (set of) user-specified restriction(s), which specifies treatment options (at each stage) that ought not to be considered as part of the estimated tree-based DTR. In simulations, we evaluate the performance of RT-RL versus the standard approach of ignoring the partial data for individuals not following the (set of) restriction(s). The method is illustrated using an observational data set to estimate a two-stage stepped-up DTR for guiding the level of care placement for adolescents with substance use disorder.  相似文献   

17.
Studies have shown that Parkinson’s, epilepsy and other brain deficits are closely related to the ability of neurons to synchronize with their neighbors. Therefore, the neurobiological mechanism and synchronization behavior of neurons has attracted much attention in recent years. In this contribution, it is numerically investigated the complex nonlinear behaviour of the Hindmarsh–Rose neuron system through the time responses, system bifurcation diagram and Lyapunov exponent under different system parameters. The system presents different and complex dynamic behaviors with the variation of parameter. Then, the identification of the nonlinear dynamics and topologies of the Hindmarsh–Rose neural networks under unknown dynamical environment is discussed. By using the deterministic learning algorithm, the unknown dynamics and topologies of the Hindmarsh–Rose system are locally accurately identified. Additionally, the identified system dynamics can be stored and represented in the form of constant neural networks due to the convergence of system parameters. Finally, based on the time-invariant representation of system dynamics, a fast dynamical pattern recognition method via system synchronization is constructed. The achievements of this work will provide more incentives and possibilities for biological experiments and medical treatment as well as other related clinical researches, such as the quantifying and explaining of neurobiological mechanism, early diagnosis, classification and control (treatment) of neurologic diseases, such as Parkinson’s and epilepsy. Simulations are included to verify the effectiveness of the proposed method.  相似文献   

18.
The idea of ‘besides the MU properties and depending on the recording techniques, MUAPs can have unique pattern’ was adopted. The aim of this work was to recognise whether a Laplacian-detected MUAP is isolated or overlapped basing on novel morphological features using fuzzy classifier. Training data set was constructed to elaborate and test the ‘if-then’ fuzzy rules using signals provided by three muscles: the abductor pollicis brevis (APB), the first dorsal interosseous (FDI) and the biceps brachii (BB) muscles of 11 healthy subjects. The proposed fuzzy classier recognized automatically the isolated MUAPs with a performance of 95.03% which was improved to 97.8% by adjusting the certainty grades of rules using genetic algorithms (GA). Synthetic signals were used as reference to further evaluate the performance of the elaborated classifier. The recognition of the isolated MUAPs depends largely on noise level and is acceptable down to the signal to noise ratio of 20 dB with a detection probability of 0.96. The recognition of overlapped MUAPs depends slightly on the noise level with a detection probability of about 0.8. The corresponding misrecognition is caused principally by the synchronisation and the small overlapping degree.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号