首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
MOTIVATION: Unsupervised analysis of microarray gene expression data attempts to find biologically significant patterns within a given collection of expression measurements. For example, hierarchical clustering can be applied to expression profiles of genes across multiple experiments, identifying groups of genes that share similar expression profiles. Previous work using the support vector machine supervised learning algorithm with microarray data suggests that higher-order features, such as pairwise and tertiary correlations across multiple experiments, may provide significant benefit in learning to recognize classes of co-expressed genes. RESULTS: We describe a generalization of the hierarchical clustering algorithm that efficiently incorporates these higher-order features by using a kernel function to map the data into a high-dimensional feature space. We then evaluate the utility of the kernel hierarchical clustering algorithm using both internal and external validation. The experiments demonstrate that the kernel representation itself is insufficient to provide improved clustering performance. We conclude that mapping gene expression data into a high-dimensional feature space is only a good idea when combined with a learning algorithm, such as the support vector machine that does not suffer from the curse of dimensionality. AVAILABILITY: Supplementary data at www.cs.columbia.edu/compbio/hiclust. Software source code available by request.  相似文献   

2.
In this paper, we propose a hybrid clustering method that combines the strengths of bottom-up hierarchical clustering with that of top-down clustering. The first method is good at identifying small clusters but not large ones; the strengths are reversed for the second method. The hybrid method is built on the new idea of a mutual cluster: a group of points closer to each other than to any other points. Theoretical connections between mutual clusters and bottom-up clustering methods are established, aiding in their interpretation and providing an algorithm for identification of mutual clusters. We illustrate the technique on simulated and real microarray datasets.  相似文献   

3.
4.
MOTIVATION: We describe a new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network. DNA array technologies allow monitoring thousands of genes rapidly and efficiently. One of the interests of these studies is the search for correlated gene expression patterns, and this is usually achieved by clustering them. The Self-Organising Tree Algorithm, (SOTA) (Dopazo,J. and Carazo,J.M. (1997) J. Mol. Evol., 44, 226-233), is a neural network that grows adopting the topology of a binary tree. The result of the algorithm is a hierarchical cluster obtained with the accuracy and robustness of a neural network. RESULTS: SOTA clustering confers several advantages over classical hierarchical clustering methods. SOTA is a divisive method: the clustering process is performed from top to bottom, i.e. the highest hierarchical levels are resolved before going to the details of the lowest levels. The growing can be stopped at the desired hierarchical level. Moreover, a criterion to stop the growing of the tree, based on the approximate distribution of probability obtained by randomisation of the original data set, is provided. By means of this criterion, a statistical support for the definition of clusters is proposed. In addition, obtaining average gene expression patterns is a built-in feature of the algorithm. Different neurons defining the different hierarchical levels represent the averages of the gene expression patterns contained in the clusters. Since SOTA runtimes are approximately linear with the number of items to be classified, it is especially suitable for dealing with huge amounts of data. The method proposed is very general and applies to any data providing that they can be coded as a series of numbers and that a computable measure of similarity between data items can be used. AVAILABILITY: A server running the program can be found at: http://bioinfo.cnio.es/sotarray.  相似文献   

5.
When applying hierarchical clustering algorithms to cluster patient samples from microarray data, the clustering patterns generated by most algorithms tend to be dominated by groups of highly differentially expressed genes that have closely related expression patterns. Sometimes, these genes may not be relevant to the biological process under study or their functions may already be known. The problem is that these genes can potentially drown out the effects of other genes that are relevant or have novel functions. We propose a procedure called complementary hierarchical clustering that is designed to uncover the structures arising from these novel genes that are not as highly expressed. Simulation studies show that the procedure is effective when applied to a variety of examples. We also define a concept called relative gene importance that can be used to identify the influential genes in a given clustering. Finally, we analyze a microarray data set from 295 breast cancer patients, using clustering with the correlation-based distance measure. The complementary clustering reveals a grouping of the patients which is uncorrelated with a number of known prognostic signatures and significantly differing distant metastasis-free probabilities.  相似文献   

6.
Aim Do species range shapes follow general patterns? If so, what mechanisms underlie those patterns? We show for 11,582 species from a variety of taxa across the world that most species have similar latitudinal and longitudinal ranges. We then seek to disentangle the roles of climate, extrinsic dispersal limitation (e.g. barriers) and intrinsic dispersal limitation (reflecting a species’ ability to disperse) as constraints of species range shape. We also assess the relationship between range size and shape. Location Global. Methods Range shape patterns were measured as the slope of the regression of latitudinal species ranges against longitudinal ranges for each taxon and continent, and as the coefficient of determination measuring the degree of scattering of species ranges from the 1:1 line (i.e. latitudinal range = longitudinal range). Two major competing hypotheses explaining species distributions (i.e. dispersal or climatic determinism) were explored. To this end, we compared the observed slopes and coefficients of determination with those predicted by a climatic null model that estimates the potential range shapes in the absence of dispersal limitation. The predictions compared were that species distribution shapes are determined purely by (1) intrinsic dispersal limitation, (2) extrinsic dispersal limitations such as topographic barriers, and (3) climate. Results  Using this methodology, we show for a wide variety of taxa across the globe that species generally have very similar latitudinal and longitudinal ranges. However, neither neutral models assuming random but spatially constrained dispersal, nor models assuming climatic control of species distributions describe range shapes adequately. The empirical relationship between the latitudinal and longitudinal ranges of species falls between the predictions of these competing models. Main conclusions We propose that this pattern arises from the combined effect of macroclimate and intrinsic dispersal limitation, the latter being the major determinant among restricted‐range species. Hence, accurately projecting the impact of climate change onto species ranges will require a solid understanding of how climate and dispersal jointly control species ranges.  相似文献   

7.
We describe an algorithm for finding the most statistically significant non-overlapping subtrees of a hierarchical clustering of gene expression data with respect to a set of secondary data labels on genes. The method is implemented as a Java plug-in for a commercial gene expression analysis program (GeneSpring).  相似文献   

8.

Background  

Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained.  相似文献   

9.
Self-organizing maps (SOM) constitute an alternative to classical clustering methods because of its linear run times and superior performance to deal with noisy data. Nevertheless, the clustering obtained with SOM is dependent on the relative sizes of the clusters. Here, we show how the combination of SOM with hierarchical clustering methods constitutes an excellent tool for exploratory analysis of massive data like DNA microarray expression patterns.  相似文献   

10.

Background

Phyletic patterns denote the presence and absence of orthologous genes in completely sequenced genomes and are used to infer functional links between genes, on the assumption that genes involved in the same pathway or functional system are co-inherited by the same set of genomes. However, this basic premise has not been quantitatively tested, and the limits of applicability of the phyletic-pattern method remain unknown.

Results

We characterized a hierarchy of 3,688 phyletic patterns encompassing more than 5,000 known protein-coding genes from 66 complete microbial genomes, using different distances, clustering algorithms, and measures of cluster quality. The most sensitive set of parameters recovered 223 clusters, each consisting of genes that belong to the same metabolic pathway or functional system. Fifty-six clusters included unexpected genes with plausible functional links to the rest of the cluster. Only a small percentage of known pathways and multiprotein complexes are co-inherited as one cluster; most are split into many clusters, indicating that gene loss and displacement has occurred in the evolution of most pathways.

Conclusions

Phyletic patterns of functionally linked genes are perturbed by differential gains, losses and displacements of orthologous genes in different species, reflecting the high plasticity of microbial genomes. Groups of genes that are co-inherited can, however, be recovered by hierarchical clustering, and may represent elementary functional modules of cellular metabolism. The phyletic patterns approach alone can confidently predict the functional linkages for about 24% of the entire data set.  相似文献   

11.
12.
13.
Recordings of ongoing neural activity with EEG and MEG exhibit oscillations of specific frequencies over a non-oscillatory background. The oscillations appear in the power spectrum as a collection of frequency bands that are evenly spaced on a logarithmic scale, thereby preventing mutual entrainment and cross-talk. Over the last few years, experimental, computational and theoretical studies have made substantial progress on our understanding of the biophysical mechanisms underlying the generation of network oscillations and their interactions, with emphasis on the role of neuronal synchronization. In this paper we ask a very different question. Rather than investigating how brain rhythms emerge, or whether they are necessary for neural function, we focus on what they tell us about functional brain connectivity. We hypothesized that if we were able to construct abstract networks, or "virtual brains", whose dynamics were similar to EEG/MEG recordings, those networks would share structural features among themselves, and also with real brains. Applying mathematical techniques for inverse problems, we have reverse-engineered network architectures that generate characteristic dynamics of actual brains, including spindles and sharp waves, which appear in the power spectrum as frequency bands superimposed on a non-oscillatory background dominated by low frequencies. We show that all reconstructed networks display similar topological features (e.g. structural motifs) and dynamics. We have also reverse-engineered putative diseased brains (epileptic and schizophrenic), in which the oscillatory activity is altered in different ways, as reported in clinical studies. These reconstructed networks show consistent alterations of functional connectivity and dynamics. In particular, we show that the complexity of the network, quantified as proposed by Tononi, Sporns and Edelman, is a good indicator of brain fitness, since virtual brains modeling diseased states display lower complexity than virtual brains modeling normal neural function. We finally discuss the implications of our results for the neurobiology of health and disease.  相似文献   

14.
15.
A C May 《Proteins》1999,37(1):20-29
Recently, several hierarchical classifications of protein three-dimensional (3D) structures have been published. However, none of them provides any assessment of the validity of a hierarchical representation or test individual clusters contained within. In fact, testing here of published trees reveals that they vary in meaning. Protein structure similarity measures are then assessed in terms of the robustness of the resulting trees for 24 protein families. A meaningful tree is defined as one in which all the clusters are found to be reliable according to a jackknife test. With the use of this criterion, a previously published similarity measure described as a "better RMS" is shown in fact to be usually less suited to protein fold classification than normal RMS after superposition. Here the "best" protein structure similarity measure for hierarchical classification-in terms of that which after clustering produces the highest number of meaningful trees, 20, for the 24 families-is found to be a new one. This measure includes information on the relationship of a distance at a given aligned position in a pair to the rest of the unique distances at that position in a protein family. There are only 2 families of the 24 tested, the globins (3 trees) and Kazal-type serine proteinase inhibitors (21 trees), in which the topology (branching order) of the meaningful 3D structure-based trees is constant. Thus, a new view of protein family sequence-structure relationships is afforded by comparing meaningful trees for each family. More generally, there is a need for care in interpretation of the results of those molecular biology algorithms that force a tree structure on data without assessing its applicability. Proteins 1999;37:20-29.  相似文献   

16.
MOTIVATION: Recent technological advances such as cDNA microarray technology have made it possible to simultaneously interrogate thousands of genes in a biological specimen. A cDNA microarray experiment produces a gene expression 'profile'. Often interest lies in discovering novel subgroupings, or 'clusters', of specimens based on their profiles, for example identification of new tumor taxonomies. Cluster analysis techniques such as hierarchical clustering and self-organizing maps have frequently been used for investigating structure in microarray data. However, clustering algorithms always detect clusters, even on random data, and it is easy to misinterpret the results without some objective measure of the reproducibility of the clusters. RESULTS: We present statistical methods for testing for overall clustering of gene expression profiles, and we define easily interpretable measures of cluster-specific reproducibility that facilitate understanding of the clustering structure. We apply these methods to elucidate structure in cDNA microarray gene expression profiles obtained on melanoma tumors and on prostate specimens.  相似文献   

17.
We have examined the global population genetic structure of Haemonchus contortus. The genetic variability was studied using both amplified fragment length polymorphism (AFLP) and nad4 sequences of the mitochondrial genome. To examine the performance and information content of the two different marker systems, comparative assessment of population genetic diversity was undertaken in 19 isolates of H. contortus, a parasitic nematode of small ruminants. A total of 150 individual adult worms representing 14 countries from all inhabited continents were analysed. Altogether 1,429 informative AFLP markers were generated using four different primer combinations. Also, the genetic variation was high, which agrees with results from previous AFLP studies of nematode parasites of livestock. The genetic structure was high, indicating limited gene flow between the different isolates and populations from each continent mostly formed monophyletic groups in the phylogenetic analysis. However, for isolates representing Australia, Greece and one laboratory strain that originated from South Africa (WRS), there was no clear genetic relationship between the isolates and the distance between their geographical origins. Basically the same pattern was observed for the mitochondrial marker, although the phylogenetic analysis was less resolved than for AFLP. In contrast with previous findings on the population genetic structure of H. contortus, the calculation of population structure gave high values (Nst=0.59). The strong structure was present also for the four Swedish isolates (Nst=0.16) representing a small geographical area.  相似文献   

18.
Understanding how invasive species spread is of particular concern in the current era of globalisation and rapid environmental change. The occurrence of super‐diffusive movements within the context of Lévy flights has been discussed with respect to particle physics, human movements, microzooplankton, disease spread in global epidemiology and animal foraging behaviour. Super‐diffusive movements provide a theoretical explanation for the rapid spread of organisms and disease, but their applicability to empirical data on the historic spread of organisms has rarely been tested. This study focuses on the role of long‐distance dispersal in the invasion dynamics of aquatic invasive species across three contrasting areas and spatial scales: open ocean (north‐east Atlantic), enclosed sea (Mediterranean) and an island environment (Ireland). Study species included five freshwater plant species, Azolla filiculoides, Elodea canadensis, Lagarosiphon major, Elodea nuttallii and Lemna minuta; and ten species of marine algae, Asparagopsis armata, Antithamnionella elegans, Antithamnionella ternifolia, Codium fragile, Colpomenia peregrina, Caulerpa taxifolia, Dasysiphonia sp., Sargassum muticum, Undaria pinnatifida and Womersleyella setacea. A simulation model is constructed to show the validity of using historical data to reconstruct dispersal kernels. Lévy movement patterns similar to those previously observed in humans and wild animals are evident in the re‐constructed dispersal pattern of invasive aquatic species. Such patterns may be widespread among invasive species and could be exacerbated by further development of trade networks, human travel and environmental change. These findings have implications for our ability to predict and manage future invasions, and improve our understanding of the potential for spread of organisms including infectious diseases, plant pests and genetically modified organisms.  相似文献   

19.
Multiple sequence alignment with hierarchical clustering.   总被引:147,自引:8,他引:147       下载免费PDF全文
F Corpet 《Nucleic acids research》1988,16(22):10881-10890
An algorithm is presented for the multiple alignment of sequences, either proteins or nucleic acids, that is both accurate and easy to use on microcomputers. The approach is based on the conventional dynamic-programming method of pairwise alignment. Initially, a hierarchical clustering of the sequences is performed using the matrix of the pairwise alignment scores. The closest sequences are aligned creating groups of aligned sequences. Then close groups are aligned until all sequences are aligned in one group. The pairwise alignments included in the multiple alignment form a new matrix that is used to produce a hierarchical clustering. If it is different from the first one, iteration of the process can be performed. The method is illustrated by an example: a global alignment of 39 sequences of cytochrome c.  相似文献   

20.

Background  

Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号