首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
We propose an algorithm for selecting and clustering genes according to their time-course or dose-response profiles using gene expression data. The proposed algorithm is based on the order-restricted inference methodology developed in statistics. We describe the methodology for time-course experiments although it is applicable to any ordered set of treatments. Candidate temporal profiles are defined in terms of inequalities among mean expression levels at the time points. The proposed algorithm selects genes when they meet a bootstrap-based criterion for statistical significance and assigns each selected gene to the best fitting candidate profile. We illustrate the methodology using data from a cDNA microarray experiment in which a breast cancer cell line was stimulated with estrogen for different time intervals. In this example, our method was able to identify several biologically interesting genes that previous analyses failed to reveal.  相似文献   

2.
MOTIVATION: Hierarchical clustering is widely used to cluster genes into groups based on their expression similarity. This method first constructs a tree. Next this tree is partitioned into subtrees by cutting all edges at some level, thereby inducing a clustering. Unfortunately, the resulting clusters often do not exhibit significant functional coherence. RESULTS: To improve the biological significance of the clustering, we develop a new framework of partitioning by snipping--cutting selected edges at variable levels. The snipped edges are selected to induce clusters that are maximally consistent with partially available background knowledge such as functional classifications. Algorithms for two key applications are presented: functional prediction of genes, and discovery of functionally enriched clusters of co-expressed genes. Simulation results and cross-validation tests indicate that the algorithms perform well even when the actual number of clusters differs considerably from the requested number. Performance is improved compared with a previously proposed algorithm. AVAILABILITY: A java package is available at http://www.cs.bgu.ac.il/~dotna/ TreeSnipping  相似文献   

3.
A new result report for Mascot search results is described. A greedy set cover algorithm is used to create a minimal set of proteins, which is then grouped into families on the basis of shared peptide matches. Protein families with multiple members are represented by dendrograms, generated by hierarchical clustering using the score of the nonshared peptide matches as a distance metric. The peptide matches to the proteins in a family can be compared side by side to assess the experimental evidence for each protein. If the evidence for a particular family member is considered inadequate, the dendrogram can be cut to reduce the number of distinct family members.  相似文献   

4.
5.
6.
Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments.  相似文献   

7.
New method of identification of dynamical domains in proteins - Hierarchical Clustering of the Correlation Patterns (HCCP) is proposed. HCCP allows to identify the domains using single three-dimensional structure of the studied proteins and does not require any adjustable parameters that can influence the results. The method is based on hierarchical clustering performed on the matrices of correlation patterns, which are obtained by the transformation of ordinary pairwise correlation matrices. This approach allows to extract additional information from the correlation matrices, which increases reliability of domain identification. It is shown that HCCP is insensitive to small variations of the pairwise correlation matrices. Particularly it produces identical results if the data obtained for the same protein crystallized with different spatial positions of domains are used for analysis. HCCP can utilize correlation matrices obtained by any method such as normal mode or essential dynamics analysis, Gaussian network or anisotropic network models, etc. These features make HCCP an attractive method for domain identification in proteins.  相似文献   

8.
Large-scale two-dimensional gel experiments have the potential to identify proteins that play an important role in elucidating cell mechanisms and in various stages of drug discovery. Such experiments, typically including hundreds or even thousands of related gels, are notoriously difficult to perform, and analysis of the gel images has until recently been virtually impossible. In this paper we describe a scalable computational model that permits the organization and analysis of a large gel collection. The model is implemented in Compugen's Z4000 system. Gels are organized in a hierarchical, multidimensional data structure that allow the user to view a large-scale experiment as a tree of numerous simpler experiments, and carry out the analysis one step at a time. Analyzed sets of gels form processing units that can be combined into higher level units in an iterative framework. The different conditions at the core of the experiment design, termed the dimensions of the experiment, are transformed from a multidimensional structure to a single hierarchy. The higher level comparison is performed with the aid of a synthetic "adaptor" gel image, called a Raw Master Gel (RMG). The RMG allows the inclusion of data from an entire set of gels to be presented as a gel image, thereby enabling the iterative process. Our model includes a flexible experimental design approach that allows the researcher to choose the condition to be analyzed a posteriori. It also enables data reuse, the performing of several different analysis designs on the same experimental data. The stability and reproducibility of a protein can be analyzed by tracking it up or down the hierarchical dimensions of the experiment.  相似文献   

9.
10.
Ortholog identification is a crucial first step in comparative genomics. Here, we present a rapid method of ortholog grouping which is effective enough to allow the comparison of many genomes simultaneously. The method takes as input all-against-all similarity data and classifies genes based on the traditional hierarchical clustering algorithm UPGMA. In the course of clustering, the method detects domain fusion or fission events, and splits clusters into domains if required. The subsequent procedure splits the resulting trees such that intra-species paralogous genes are divided into different groups so as to create plausible orthologous groups. As a result, the procedure can split genes into the domains minimally required for ortholog grouping. The procedure, named DomClust, was tested using the COG database as a reference. When comparing several clustering algorithms combined with the conventional bidirectional best-hit (BBH) criterion, we found that our method generally showed better agreement with the COG classification. By comparing the clustering results generated from datasets of different releases, we also found that our method showed relatively good stability in comparison to the BBH-based methods.  相似文献   

11.
We have investigated the use of hierarchical clustering of flow cytometry data to classify samples of conventional central chondrosarcoma, a malignant cartilage forming tumor of uncertain cellular origin, according to similarities with surface marker profiles of several known cell types. Human primary chondrosarcoma cells, articular chondrocytes, mesenchymal stem cells, fibroblasts, and a panel of tumor cell lines from chondrocytic or epithelial origin were clustered based on the expression profile of eleven surface markers. For clustering, eight hierarchical clustering algorithms, three distance metrics, as well as several approaches for data preprocessing, including multivariate outlier detection, logarithmic transformation, and z‐score normalization, were systematically evaluated. By selecting clustering approaches shown to give reproducible results for cluster recovery of known cell types, primary conventional central chondrosacoma cells could be grouped in two main clusters with distinctive marker expression signatures: one group clustering together with mesenchymal stem cells (CD49b‐high/CD10‐low/CD221‐high) and a second group clustering close to fibroblasts (CD49b‐low/CD10‐high/CD221‐low). Hierarchical clustering also revealed substantial differences between primary conventional central chondrosarcoma cells and established chondrosarcoma cell lines, with the latter not only segregating apart from primary tumor cells and normal tissue cells, but clustering together with cell lines from epithelial lineage. Our study provides a foundation for the use of hierarchical clustering applied to flow cytometry data as a powerful tool to classify samples according to marker expression patterns, which could lead to uncover new cancer subtypes. J. Cell. Physiol. 225: 601–611, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

12.
We perform a computational study using a new approach to the analysis of protein sequences. The contextual alignment model, proposed recently by Gambin et al. (2002), is based on the assumption that, while constructing an alignment, the score of a substitution of one residue by another depends on the surrounding residues. The contextual alignment scores calculated in this model were used to hierarchical clustering of several protein families from the database of Clusters of Orthologous Groups (COG). The clustering has been also constructed based on the standard approach. The comparative analysis shows that the contextual model results in more consistent clustering trees. The difference, although small, is with no exception in favour of the contextual model. The consistency of the family of trees is measured by several consensus and agreement methods, as well as by the inter-tree distance approach.  相似文献   

13.
The POPPs is a suite of inter-related software tools which allow the user to discover what is statistically 'unusual' in the composition of an unknown protein, or to automatically cluster proteins into families based on peptide composition. Finally, the user can search for related proteins based on peptide composition. Statistically based peptide composition provides a view of proteins that is, to some extent, orthogonal to that provided by sequence. In a test study, the POPP suite is able to regroup into their families sets of approximately 100 randomised Pfam protein domains. The POPPs suite is used to explore the diverse set of late embryogenesis abundant (LEA) proteins.  相似文献   

14.
The purpose of this study was to examine the hierarchical complexity of combinatorial manipulation in capuchin monkeys (Cebus apella). Two experiments were conducted. In Experiment 1 capuchins were presented with an apparatus designed to accommodate the use of probing tools. In Experiment 2 the same capuchins were presented with sets of nesting containers. Five of the ten subjects used probing tools and seven subjects placed objects in the containers. The capuchins' behavior reflected three hierarchically organized combinatorial patterns displayed by chimpanzees and human infants. Although the capuchins sometimes displayed the two more complex patterns (“pot” and “subassembly”), their combinatorial behavior was dominated by the simplest pattern (“pairing”). In this regard capuchins may not attain the same grammar of manipulative action that has been reported for chimpanzees and young human children. © 1994 Wiley-Liss, Inc.  相似文献   

15.
Often a screening or selection experiment targets a cell or tissue, which presents many possible molecular targets and identifies a correspondingly large number of ligands. We describe a statistical method to extract an estimate of the complexity or richness of the set of molecular targets from competition experiments between distinguishable ligands, including aptamers derived from combinatorial experiments (SELEX or phage display). In simulations, the non-parametric statistic provides a robust estimate of complexity from a 100 ×100 matrix of competition experiments, which is clearly feasible in high-throughput format. The statistic and method are potentially applicable to other ligand binding situations.  相似文献   

16.
Summary A detailed analysis was undertaken to test the efficacy of hierarchical agglomerative clustering (UPGMA method) in grouping the races and strains of the mulberry silkworm, Bombyx moti L., and to ascertain the importance of biochemical parameters in the clustering process. The analysis was based on data from two rearing seasons with 54 selected races/strains of different geographic origin and varying yield potentials. The results indicate that seven clusters can be realised with yield parameters alone, whereas the inclusion of biochemical parameters in clustering resulted into two broad groups: one having all the breeds with high cocoon weight and shell weight, the other having all the low-yielding silkworm strains both from India and from other countries. Further sub-grouping under these two groups highlights genetical differences associated with the differentiation of various groups of races in temperate and tropical areas as well as their significance for silkworm breeding. Estimates of all ten variables were further subjected to quick clustering and the results showed that cluster 5, constituted by 38 lowyielding strains of India, China and Europe, had the highest values of the final cluster centre for amylase and the effective rate of rearing (ERR), while clusters 1 and 4 had the highest values for invertase and alkaline phosphatase. The evolutionary aspect of the genetic channelisation of silkworm races from various countries is discussed against the background of differences in the biochemical parameters and yield variables.  相似文献   

17.
PurposeSegmentation of cardiac sub-structures for dosimetric analyses is usually performed manually in time-consuming procedure. Automatic segmentation may facilitate large-scale retrospective analysis and adaptive radiotherapy. Various approaches, among them Hierarchical Clustering, were applied to improve performance of atlas-based segmentation (ABS).MethodsTraining dataset of ABS consisted of 36 manually contoured CT-scans. Twenty-five cardiac sub-structures were contoured as regions of interest (ROIs). Five auto-segmentation methods were compared: simultaneous automatic contouring of all 25 ROIs (Method-1); automatic contouring of all 25 ROIs using lungs as anatomical barriers (Method-2); automatic contouring of a single ROI for each contouring cycle (Method-3); hierarchical cluster-based automatic contouring (Method-4); simultaneous truth and performance level estimation (STAPLE). Results were evaluated on 10 patients. Dice similarity coefficient (DSC), average Hausdorff distance (AHD), volume comparison and physician score were used as validation metrics.ResultsAtlas performance improved increasing number of atlases. Among the five ABS methods, Hierarchical Clustering workflow showed a significant improvement maintaining a clinically acceptable time for contouring. Physician scoring was acceptable for 70% of the ROI automatically contoured. Inter-observer evaluation showed that contours obtained by Hierarchical Clustering method are statistically comparable with them obtained by a second, independent, expert contourer considering DSC. Considering AHD, distance from the gold standard is lower for ROIs segmented by ABS.ConclusionsHierarchical clustering resulted in best ABS results for the primarily investigated platforms and compared favorably to a second benchmark system. Auto-contouring of smaller structures, being in range of variation between manual contourers, may be ideal for large-scale retrospective dosimetric analysis.  相似文献   

18.
A new measure (CL) of spatial/structural landscape complexity is developed in this paper, based on the Levenshtein algorithm used in Computer Science and Bioinformatics for string comparisons. The Levenshtein distance (or edit distance) between two strings of symbols is the minimum of all possible replacements, deletions and insertions necessary to convert one string into the other. In this paper, it is shown how this measure can be applicable on raster landscape maps of any size or shape. Calculations and applications are shown on model and real landscapes. The main advantages of this measure for structural (spatial) landscape analysis are the following: it is easily applicable; it can be compared to its maximum value (depending on the grid resolution); it can be used to compare structural/spatial complexities between landscapes; it is applicable to raster landscape maps of any shape; and it can be used to calculate changes in landscape complexity over time. At the level of ecological practice, it may aid in landscape monitoring, management and planning, by identifying areas of higher structural landscape complexity, which may deserve greater attention in the process of landscape conservation.  相似文献   

19.
MOTIVATION: A common problem in the emerging field of metabolomics is the consolidation of signal lists derived from metabolic profiling of different cell/tissue/fluid states where a number of replicate experiments was collected on each state. RESULTS: We describe an approach for the consolidation of peak lists based on hierarchical clustering, first within each set of replicate experiments and then between the sets of replicate experiments. The problems of finding the dendrogram tree cutoff which gives the optimal number of peak clusters and the effect of different clustering methods were addressed. When applied to gas chromatography-mass spectrometry metabolic profiling data acquired on Leishmania mexicana, this approach resulted in robust data matrices which completely separated the wild-type and two mutant parasite lines based on their metabolic profile.  相似文献   

20.
Cytokinesis: placing and making the final cut   总被引:7,自引:0,他引:7  
Barr FA  Gruneberg U 《Cell》2007,131(5):847-860
Cytokinesis is the process by which cells physically separate after the duplication and spatial segregation of the genetic material. A number of general principles apply to this process. First the microtubule cytoskeleton plays an important role in the choice and positioning of the division site. Once the site is chosen, the local assembly of the actomyosin contractile ring remodels the plasma membrane. Finally, membrane trafficking to and membrane fusion at the division site cause the physical separation of the daughter cells, a process termed abscission. Here we will discuss recent advances in our understanding of the mechanisms of cytokinesis in animals, yeast, and plants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号