首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study developed a methodology to temporally classify large scale, upper level atmospheric conditions over North America, utilizing a newly-developed upper level synoptic classification (ULSC). Four meteorological variables: geopotential height, specific humidity, and u- and v-wind components, at the 500 hPa level over North America were obtained from the NCEP/NCAR Reanalysis Project dataset for the period 1965-1974. These data were subjected to principal components analysis to standardize and reduce the dataset, and then an average linkage clustering algorithm identified groups of observations with similar flow patterns. The procedure yielded 16 clusters. These flow patterns identified by the ULSC typify all patterns expected to be observed over the study area. Additionally, the resulting cluster calendar for the period 1965-1974 showed that the clusters are generally temporally continuous. Subsequent classification of additional observations through a z-score method produced acceptable results, indicating that additional observations may easily be incorporated into the ULSC calendar. The ULSC calendar of synoptic conditions can be used to identify situations that lead to periods of extreme weather, i.e., heat waves, flooding and droughts, and to explore long-distance dispersal of airborne particles and biota across North America.  相似文献   

2.
When analyzing the results of microarray experiments, biologists generally use unsupervised categorization tools. However, such tools regard each time point as an independent dimension and utilize the Euclidean distance to compute the similarities between expressions. Furthermore, some of these methods require the number of clusters to be determined in advance, which is clearly impossible in the case of a new dataset. Therefore, this study proposes a novel scheme, designated as the Variation-based Coexpression Detection (VCD) algorithm, to analyze the trends of expressions based on their variation over time. The proposed algorithm has two advantages. First, it is unnecessary to determine the number of clusters in advance since the algorithm automatically detects those genes whose profiles are grouped together and creates patterns for these groups. Second, the algorithm features a new measurement criterion for calculating the degree of change of the expressions between adjacent time points and evaluating their trend similarities. Three real-world microarray datasets are employed to evaluate the performance of the proposed algorithm.  相似文献   

3.
A novel sensor was developed, based on light scatter, to estimate the cell concentration in the presence of suspended solids. The light scatter properties of cells in the presence of suspended solids were investigated. Two crucial observations were made: first, that the light scatter from cells is essentially a linear function of cell concentration and, second, that invariant regions are present in the light scatter spectrum of cell/solid substrate mixtures. Invariant regions are wavelength intervals of the light scatter spectrum in which the light scatter reading is independent of solid substrate concentration and only a function of cell concentration. The occurrence of invariant regions is the key behavior which allowed the quantification of cell concentration in the presence of suspended solids.An algorithm was developed for the estimation, from light scatter data, of cell concentration in the presence of solid substrate. The light scatter approach was validated by comparing cell concentrations estimated by this technique to those obtained from DNA and carbon dioxide evolution rate measurements during a series of fermentations. The model system used was Bacillus subtilis var sakainensis ATCC 21394 growing on fishmeal as the sole nitrogen source.A model was developed based on the interactions of scatter and absorbance. This model reflects the hypothesis that invariant regions are caused by changes in the absorbance of the solid substrate as a function of wavelength. (c) 1992 John Wiley & Sons, Inc.  相似文献   

4.
BACKGROUND: Artificial neural networks (ANNs) have been shown to be valuable in the analysis of analytical flow cytometric (AFC) data in aquatic ecology. Automated extraction of clusters is an important first stage in deriving ANN training data from field samples, but AFC data pose a number of challenges for many types of clustering algorithm. The fuzzy k-means algorithm recently has been extended to address nonspherical clusters with the use of scatter matrices. Four variants were proposed, each optimizing a different measure of clustering "goodness." METHODS: With AFC data obtained from marine phytoplankton species in culture, the four fuzzy k-means algorithm variants were compared with each other and with another multivariate clustering algorithm based on critical distances currently used in flow cytometry. RESULTS: One of the algorithm variants (adaptive distances, also known as the Gustafson--Kessel algorithm) was found to be robust and reliable, whereas the others showed various problems. CONCLUSIONS: The adaptive distances algorithm was superior in use to the clustering algorithms against which it was tested, but the problem of automatic determination of the number of clusters remains to be addressed.  相似文献   

5.
Validating clustering for gene expression data   总被引:24,自引:0,他引:24  
MOTIVATION: Many clustering algorithms have been proposed for the analysis of gene expression data, but little guidance is available to help choose among them. We provide a systematic framework for assessing the results of clustering algorithms. Clustering algorithms attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. Our methodology is to apply a clustering algorithm to the data from all but one experimental condition. The remaining condition is used to assess the predictive power of the resulting clusters-meaningful clusters should exhibit less variation in the remaining condition than clusters formed by chance. RESULTS: We successfully applied our methodology to compare six clustering algorithms on four gene expression data sets. We found our quantitative measures of cluster quality to be positively correlated with external standards of cluster quality.  相似文献   

6.
We present an oscillator network model for the synchronization of oscillatory neuronal activity underlying visual processing. The single neuron is modeled by means of a limit cycle oscillator with an eigenfrequency corresponding to visual stimulation. The eigenfrequency may be time dependent. The mutual coupling strengths are unsymmetrical and activity dependent, and they scatter within the network. Synchronized clusters (groups) of neurons emerge in the network due to the visual stimulation. The different clusters correspond to different visual stimuli. There is no limitation of the number of stimuli. Distinct clusters do not perturb each other, although the coupling strength between all model neurons is of the same order of magnitude. Our analysis is not restricted to weak coupling strength. The scatter of the couplings causes shifts of the cluster frequencies. The model's behavior is compared with the experimental findings. The coupling mechanism is extended in order to model the influence of bicucullin upon the neural network. We additionally investigate repulsive couplings, which lead to constant phase differences between clusters of the same frequency. Finally, we consider the problem of selective attention from the viewpoint of our model.  相似文献   

7.
The aim of this paper is to present a new clustering algorithm for short time-series gene expression data that is able to characterise temporal relations in the clustering environment (ie data-space), which is not achieved by other conventional clustering algorithms such as k -means or hierarchical clustering. The algorithm called fuzzy c -varieties clustering with transitional state discrimination preclustering (FCV-TSD) is a two-step approach which identifies groups of points ordered in a line configuration in particular locations and orientations of the data-space that correspond to similar expressions in the time domain. We present the validation of the algorithm with both artificial and real experimental datasets, where k -means and random clustering are used for comparison. The performance was evaluated with a measure for internal cluster correlation and the geometrical properties of the clusters, showing that the FCV-TSD algorithm had better performance than the k -means algorithm on both datasets.  相似文献   

8.
 We present an oscillator network model for the synchronization of oscillatory neuronal activity underlying visual processing. The single neuron is modeled by means of a limit cycle oscillator with an eigenfrequency corresponding to visual stimulation. The eigenfrequency may be time dependent. The mutual coupling strengths are unsymmetrical and activity dependent, and they scatter within the network. Synchronized clusters (groups) of neurons emerge in the network due to the visual stimulation. The different clusters correspond to different visual stimuli. There is no limitation of the number of stimuli. Distinct clusters do not perturb each other, although the coupling strength between all model neurons is of the same order of magnitude. Our analysis is not restricted to weak coupling strength. The scatter of the couplings causes shifts of the cluster frequencies. The model’s behavior is compared with the experimental findings. The coupling mechanism is extended in order to model the influence of bicucullin upon the neural network. We additionally investigate repulsive couplings, which lead to constant phase differences between clusters of the same frequency. Finally, we consider the problem of selective attention from the viewpoint of our model. Received: 15 February 1995/Accepted in revised form: 18 July 1995  相似文献   

9.
Backbone cluster identification in proteins by a graph theoretical method   总被引:4,自引:0,他引:4  
A graph theoretical algorithm has been developed to identify backbone clusters of residues in proteins. The identified clusters show protein sites with the highest degree of interactions. An adjacency matrix is constructed from the non-bonded connectivity information in proteins. The diagonalization of such a matrix yields eigenvalues and eigenvectors, which contain the information on clusters. In graph theory, distinct clusters can be obtained from the second lowest eigenvector components of the matrix. However, in an interconnected graph, all the points appear as one single cluster. We have developed a method of identifying highly interacting centers (clusters) in proteins by truncating the vector components of high eigenvalues. This paper presents in detail the method adopted for identifying backbone clusters and the application of the algorithm to families of proteins like RNase-A and globin. The objective of this study was to show the efficiency of the algorithm as well as to detect conserved or similar backbone packing regions in a particular protein family. Three clusters in topologically similar regions in the case of the RNase-A family and three clusters around the porphyrin ring in the globin family were observed. The predicted clusters are consistent with the features of the family of proteins such as the topology and packing density. The method can be applied to problems such as identification of domains and recognition of structural similarities in proteins.  相似文献   

10.
In this paper, we propose a hybrid clustering method that combines the strengths of bottom-up hierarchical clustering with that of top-down clustering. The first method is good at identifying small clusters but not large ones; the strengths are reversed for the second method. The hybrid method is built on the new idea of a mutual cluster: a group of points closer to each other than to any other points. Theoretical connections between mutual clusters and bottom-up clustering methods are established, aiding in their interpretation and providing an algorithm for identification of mutual clusters. We illustrate the technique on simulated and real microarray datasets.  相似文献   

11.
Inference from clustering with application to gene-expression microarrays.   总被引:7,自引:0,他引:7  
There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.  相似文献   

12.
In randomized trials or observational studies involving clustered units, the assumption of independence within clusters is not practical. Existing parametric or semiparametric methods assume specific dependence structures within a cluster. Furthermore, parametric model assumptions may not even be realistic when data are measured in a nonmetric scale as commonly happens, for example, in quality‐of‐life outcomes. In this paper, nonparametric effect‐size measures for clustered data that allow meaningful and interpretable probabilistic comparisons of treatments or intervention programs will be introduced. The dependence among observations within a cluster can be arbitrary. Point estimators along with their asymptotic properties for computing confidence intervals and performing hypothesis test will be discussed. Small sample approximations that retain some of the optimal asymptotic behaviors will be presented. In our setup, some clusters may involve observations coming from both intervention groups (referred to as complete clusters), while others may contain observations from one group only (referred to as incomplete clusters). In deriving the asymptotic theories, we do not impose any relation in the rate of divergence of the numbers of complete and incomplete clusters. Simulations show favorable performance of the methods for arbitrary combinations of complete and incomplete clusters. The developed nonparametric methods are illustrated using data from a randomized trial of indoor wood smoke reduction to improve asthma symptoms and a cluster‐randomized trial for smoking cessation.  相似文献   

13.
Tseng GC  Wong WH 《Biometrics》2005,61(1):10-16
In this article, we propose a method for clustering that produces tight and stable clusters without forcing all points into clusters. The methodology is general but was initially motivated from cluster analysis of microarray experiments. Most current algorithms aim to assign all genes into clusters. For many biological studies, however, we are mainly interested in identifying the most informative, tight, and stable clusters of sizes, say, 20-60 genes for further investigation. We want to avoid the contamination of tightly regulated expression patterns of biologically relevant genes due to other genes whose expressions are only loosely compatible with these patterns. "Tight clustering" has been developed specifically to address this problem. It applies K-means clustering as an intermediate clustering engine. Early truncation of a hierarchical clustering tree is used to overcome the local minimum problem in K-means clustering. The tightest and most stable clusters are identified in a sequential manner through an analysis of the tendency of genes to be grouped together under repeated resampling. We validated this method in a simulated example and applied it to analyze a set of expression profiles in the study of embryonic stem cells.  相似文献   

14.
There are many methods available to predict electron output factors; however, many centres still measure the factors for each irregular electron field. Creating an electron output factor prediction model that approaches measurement accuracy – but uses already available data and is simple to implement – would be advantageous in the clinical setting. This work presents an empirical spline model for output factor prediction that requires only the measured factors for arbitrary insert shapes. Equivalent ellipses of the insert shapes are determined and then parameterised by width and ratio of perimeter to area. This takes into account changes in lateral scatter, bremsstrahlung produced in the insert material, and scatter from the edge of the insert. Agreement between prediction and measurement for the 12 MeV validation data had an uncertainty of 0.4% (1SD). The maximum recorded deviation between measurement and prediction over the range of energies was 1.0%. The validation methodology showed that one may expect an approximate uncertainty of 0.5% (1SD) when as little as eight data points are used. The level of accuracy combined with the ease with which this model can be generated demonstrates its suitability for clinical use. Implementation of this method is freely available for download at https://github.com/SimonBiggs/electronfactors.  相似文献   

15.
Summary This article develops a latent model and likelihood‐based inference to detect temporal clustering of events. The model mimics typical processes generating the observed data. We apply model selection techniques to determine the number of clusters, and develop likelihood inference and a Monte Carlo expectation–maximization algorithm to estimate model parameters, detect clusters, and identify cluster locations. Our method differs from the classical scan statistic in that we can simultaneously detect multiple clusters of varying sizes. We illustrate the methodology with two real data applications and evaluate its efficiency through simulation studies. For the typical data‐generating process, our methodology is more efficient than a competing procedure that relies on least squares.  相似文献   

16.
Measures of biodiversity are often hindered by a lack of methodological practices that distinguish cryptic or morphologically similar cohabiting species. This is particularly difficult for marine fishes where direct observations of the ecology and demography of populations are difficult. Dragonets (Foetorepus c.f. calauropomus) were collected as bycatch from research trawls deployed in waters off north-eastern Tasmania, Australia. Morphometric and genetic analyses were conducted on the 43 specimens recovered. Sequence analysis of two mitochondrial loci distinguished three genetic clusters, each having levels of dissimilarity consistent with species-level distinctions between other members of the Callionymidae. While clear morphological distinctions were observed between male and female fish, limited morphometric analyses could not differentiate between members of the three genetic groups. This finding highlights questions about the ability of genetically distinct but morphologically similar groups to occupy the same ecological niche, and points to additional and undescribed hidden biodiversity amongst cryptic species of fish.  相似文献   

17.
Identifying clusters of functionally related genes in genomes   总被引:4,自引:0,他引:4  
MOTIVATION: An increasing body of literature shows that genomes of eukaryotes can contain clusters of functionally related genes. Most approaches to identify gene clusters utilize microarray data or metabolic pathway databases to find groups of genes on chromosomes that are linked by common attributes. A generalized method that can find gene clusters regardless of the mechanism of origin would provide researchers with an unbiased method for finding clusters and studying the evolutionary forces that give rise to them. RESULTS: We present an algorithm to identify gene clusters in eukaryotic genomes that utilizes functional categories defined in graph-based vocabularies such as the Gene Ontology (GO). Clusters identified in this manner need only have a common function and are not constrained by gene expression or other properties. We tested the algorithm by analyzing genomes of a representative set of species. We identified species-specific variation in percentage of clustered genes as well as in properties of gene clusters including size distribution and functional annotation. These properties may be diagnostic of the evolutionary forces that lead to the formation of gene clusters. AVAILABILITY: A software implementation of the algorithm and example output files are available at http://fcg.tamu.edu/C_Hunter/.  相似文献   

18.
This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data. It can be used for gene grouping, selection, and classification. The partitioning of a relational table into attribute subgroups allows a small number of attributes within or across the groups to be selected for analysis. By clustering attributes, the search dimension of a data mining algorithm is reduced. The reduction of search dimension is especially important to data mining in gene expression data because such data typically consist of a huge number of genes (attributes) and a small number of gene expression profiles (tuples). Most data mining algorithms are typically developed and optimized to scale to the number of tuples instead of the number of attributes. The situation becomes even worse when the number of attributes overwhelms the number of tuples, in which case, the likelihood of reporting patterns that are actually irrelevant due to chances becomes rather high. It is for the aforementioned reasons that gene grouping and selection are important preprocessing steps for many data mining algorithms to be effective when applied to gene expression data. This paper defines the problem of attribute clustering and introduces a methodology to solving it. Our proposed method groups interdependent attributes into clusters by optimizing a criterion function derived from an information measure that reflects the interdependence between attributes. By applying our algorithm to gene expression data, meaningful clusters of genes are discovered. The grouping of genes based on attribute interdependence within group helps to capture different aspects of gene association patterns in each group. Significant genes selected from each group then contain useful information for gene expression classification and identification. To evaluate the performance of the proposed approach, we applied it to two well-known gene expression data sets and compared our results with those obtained by other methods. Our experiments show that the proposed method is able to find the meaningful clusters of genes. By selecting a subset of genes which have high multiple-interdependence with others within clusters, significant classification information can be obtained. Thus, a small pool of selected genes can be used to build classifiers with very high classification rate. From the pool, gene expressions of different categories can be identified.  相似文献   

19.
Robust regression for clustered data with application to binary responses   总被引:3,自引:0,他引:3  
Preisser JS  Qaqish BF 《Biometrics》1999,55(2):574-579
Generalized estimating equations (GEE) can be highly influenced by the presence of unusual data points. A generalization of the GEE procedure, which yields parameter estimates and fitted values that are resistant to influential data, is introduced. Resistant generalized estimating equations (REGEE) include weights in the estimating equations to downweight influential observations or clusters. Influential observations are downweighted according to their leverage or residual in an example of correlated binary regression applied to 137 urinary incontinent elderly patients from 38 medical practices.  相似文献   

20.
We propose an algorithm for selecting and clustering genes according to their time-course or dose-response profiles using gene expression data. The proposed algorithm is based on the order-restricted inference methodology developed in statistics. We describe the methodology for time-course experiments although it is applicable to any ordered set of treatments. Candidate temporal profiles are defined in terms of inequalities among mean expression levels at the time points. The proposed algorithm selects genes when they meet a bootstrap-based criterion for statistical significance and assigns each selected gene to the best fitting candidate profile. We illustrate the methodology using data from a cDNA microarray experiment in which a breast cancer cell line was stimulated with estrogen for different time intervals. In this example, our method was able to identify several biologically interesting genes that previous analyses failed to reveal.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号