首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Despite the availability of newer approaches, traditional hierarchical clustering remains very popular in genetic diversity studies in plants. However, little is known about its suitability for molecular marker data. We studied the performance of traditional hierarchical clustering techniques using real and simulated molecular marker data. Our study also compared the performance of traditional hierarchical clustering with model-based clustering (STRUCTURE). We showed that the cophenetic correlation coefficient is directly related to subgroup differentiation and can thus be used as an indicator of the presence of genetically distinct subgroups in germplasm collections. Whereas UPGMA performed well in preserving distances between accessions, Ward excelled in recovering groups. Our results also showed a close similarity between clusters obtained by Ward and by STRUCTURE. Traditional cluster analysis can provide an easy and effective way of determining structure in germplasm collections using molecular marker data, and, the output can be used for sampling core collections or for association studies.  相似文献   

2.

Background

The Framingham Heart Study has contributed a great deal to advances in medicine. Most of the phenotypes investigated have been univariate traits (quantitative or qualitative). The aims of this study are to derive multivariate traits by identifying homogeneous groups of people and assigning both qualitative and quantitative trait scores; to assess the heritability of the derived traits; and to conduct both qualitative and quantitative linkage analysis on one of the heritable traits.

Methods

Multiple correspondence analysis, a nonparametric analogue of principal components analysis, was used for data reduction. Two-stage clustering, using both k-means and agglomerative hierarchical clustering, was used to cluster individuals based upon axes (factor) scores obtained from the data reduction. Probability of cluster membership was calculated using binary logistic regression. Heritability was calculated using SOLAR, which was also used for the quantitative trait analysis. GENEHUNTER-PLUS was used for the qualitative trait analysis.

Results

We found four phenotypically distinct groups. Membership in the smallest group was heritable (38%, p < 1 × 10-6) and had characteristics consistent with atherogenic dyslipidemia. We found both qualitative and quantitative LOD scores above 3 on chromosomes 11 and 14 (11q13, 14q23, 14q31). There were two Kong &; Cox LOD scores above 1.0 on chromosome 6 (6p21) and chromosome 11 (11q23).

Conclusion

This approach may be useful for the identification of genetic heterogeneity in complex phenotypes by clarifying the phenotype definition prior to linkage analysis. Some of our findings are in regions linked to elements of atherogenic dyslipidemia and related diagnoses, some may be novel, or may be false positives.
  相似文献   

3.
Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEA-PARTITION in this paper. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations. The results showed that BEA-PARTITION and hierarchical clustering algorithm outperformed k-means clustering and self-organizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEA-PARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEA-PARTITION had higher purity, lower entropy, and higher mutual information than those produced by k-means and self-organizing map. Whereas BEA-PARTITION and the hierarchical clustering produced similar quality of clusters, BEA-PARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEA-PARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations.  相似文献   

4.
MOTIVATION: Clustering has been used as a popular technique for finding groups of genes that show similar expression patterns under multiple experimental conditions. Many clustering methods have been proposed for clustering gene-expression data, including the hierarchical clustering, k-means clustering and self-organizing map (SOM). However, the conventional methods are limited to identify different shapes of clusters because they use a fixed distance norm when calculating the distance between genes. The fixed distance norm imposes a fixed geometrical shape on the clusters regardless of the actual data distribution. Thus, different distance norms are required for handling the different shapes of clusters. RESULTS: We present the Gustafson-Kessel (GK) clustering method for microarray gene-expression data. To detect clusters of different shapes in a dataset, we use an adaptive distance norm that is calculated by a fuzzy covariance matrix (F) of each cluster in which the eigenstructure of F is used as an indicator of the shape of the cluster. Moreover, the GK method is less prone to falling into local minima than the k-means and SOM because it makes decisions through the use of membership degrees of a gene to clusters. The algorithmic procedure is accomplished by the alternating optimization technique, which iteratively improves a sequence of sets of clusters until no further improvement is possible. To test the performance of the GK method, we applied the GK method and well-known conventional methods to three recently published yeast datasets, and compared the performance of each method using the Saccharomyces Genome Database annotations. The clustering results of the GK method are more significantly relevant to the biological annotations than those of the other methods, demonstrating its effectiveness and potential for clustering gene-expression data. AVAILABILITY: The software was developed using Java language, and can be executed on the platforms that JVM (Java Virtual Machine) is running. It is available from the authors upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at http://dragon.kaist.ac.kr/gk.  相似文献   

5.
MOTIVATION: It is well understood that the successful clustering of expression profiles give beneficial ideas to understand the functions of uncharacterized genes. In order to realize such a successful clustering, we investigate a clustering method based on adaptive resonance theory (ART) in this report. RESULTS: We apply Fuzzy ART as a clustering method for analyzing the time series expression data during sporulation of Saccharomyces cerevisiae. The clustering result by Fuzzy ART was compared with those by other clustering methods such as hierarchical clustering, k-means algorithm and self-organizing maps (SOMs). In terms of the mathematical validations, Fuzzy ART achieved the most reasonable clustering. We also verified the robustness of Fuzzy ART using noised data. Furthermore, we defined the correctness ratio of clustering, which is based on genes whose temporal expressions are characterized biologically. Using this definition, it was proved that the clustering ability of Fuzzy ART was superior to other clustering methods such as hierarchical clustering, k-means algorithm and SOMs. Finally, we validate the clustering results by Fuzzy ART in terms of biological functions and evidence. AVAILABILITY: The software is available at http//www.nubio.nagoya-u.ac.jp/proc/index.html  相似文献   

6.
7.
In this paper, three different clustering algorithms were applied to assemble infrared (IR) spectral maps from IR microspectra of tissues. Using spectra from a colorectal adenocarcinoma section, we show how IR images can be assembled by agglomerative hierarchical (AH) clustering (Ward's technique), fuzzy C-means (FCM) clustering, and k-means (KM) clustering. We discuss practical problems of IR imaging on tissues such as the influence of spectral quality and data pretreatment on image quality. Furthermore, the applicability of cluster algorithms to the spatially resolved microspectroscopic data and the degree of correlation between distinct cluster images and histopathology are compared. The use of any of the clustering algorithms dramatically increased the information content of the IR images, as compared to univariate methods of IR imaging (functional group mapping). Among the cluster imaging methods, AH clustering (Ward's algorithm) proved to be the best method in terms of tissue structure differentiation.  相似文献   

8.
MOTIVATION: Clustering is one of the most widely used methods in unsupervised gene expression data analysis. The use of different clustering algorithms or different parameters often produces rather different results on the same data. Biological interpretation of multiple clustering results requires understanding how different clusters relate to each other. It is particularly non-trivial to compare the results of a hierarchical and a flat, e.g. k-means, clustering. RESULTS: We present a new method for comparing and visualizing relationships between different clustering results, either flat versus flat, or flat versus hierarchical. When comparing a flat clustering to a hierarchical clustering, the algorithm cuts different branches in the hierarchical tree at different levels to optimize the correspondence between the clusters. The optimization function is based on graph layout aesthetics or on mutual information. The clusters are displayed using a bipartite graph where the edges are weighted proportionally to the number of common elements in the respective clusters and the weighted number of crossings is minimized. The performance of the algorithm is tested using simulated and real gene expression data. The algorithm is implemented in the online gene expression data analysis tool Expression Profiler. AVAILABILITY: http://www.ebi.ac.uk/expressionprofiler  相似文献   

9.
10.
Serban N  Jiang H 《Biometrics》2012,68(3):805-814
Summary In this article, we investigate clustering methods for multilevel functional data, which consist of repeated random functions observed for a large number of units (e.g., genes) at multiple subunits (e.g., bacteria types). To describe the within- and between variability induced by the hierarchical structure in the data, we take a multilevel functional principal component analysis (MFPCA) approach. We develop and compare a hard clustering method applied to the scores derived from the MFPCA and a soft clustering method using an MFPCA decomposition. In a simulation study, we assess the estimation accuracy of the clustering membership and the cluster patterns under a series of settings: small versus moderate number of time points; various noise levels; and varying number of subunits per unit. We demonstrate the applicability of the clustering analysis to a real data set consisting of expression profiles from genes activated by immunity system cells. Prevalent response patterns are identified by clustering the expression profiles using our multilevel clustering analysis.  相似文献   

11.

Background  

Visualization tools allow researchers to obtain a global view of the interrelationships between the probes or experiments of a gene expression (e.g. microarray) data set. Some existing methods include hierarchical clustering and k-means. In recent years, others have proposed applying minimum spanning trees (MST) for microarray clustering. Although MST-based clustering is formally equivalent to the dendrograms produced by hierarchical clustering under certain conditions; visually they can be quite different.  相似文献   

12.
MOTIVATION: With the increasing number of gene expression databases, the need for more powerful analysis and visualization tools is growing. Many techniques have successfully been applied to unravel latent similarities among genes and/or experiments. Most of the current systems for microarray data analysis use statistical methods, hierarchical clustering, self-organizing maps, support vector machines, or k-means clustering to organize genes or experiments into 'meaningful' groups. Without prior explicit bias almost all of these clustering methods applied to gene expression data not only produce different results, but may also produce clusters with little or no biological relevance. Of these methods, agglomerative hierarchical clustering has been the most widely applied, although many limitations have been identified. RESULTS: Starting with a systematic comparison of the underlying theories behind clustering approaches, we have devised a technique that combines tree-structured vector quantization and partitive k-means clustering (BTSVQ). This hybrid technique has revealed clinically relevant clusters in three large publicly available data sets. In contrast to existing systems, our approach is less sensitive to data preprocessing and data normalization. In addition, the clustering results produced by the technique have strong similarities to those of self-organizing maps (SOMs). We discuss the advantages and the mathematical reasoning behind our approach.  相似文献   

13.
The purpose of this study was to investigate the benefit of landmark registration when applied to waveform data. We compared the ability of data reduced from time-normalised and landmark registered vertical ground reaction force (vGRF) waveforms captured during maximal countermovement jumps (CMJ) of 53 active male subjects to predict jump height. vGRF waveforms were landmark registered using different landmarks resulting in four registration conditions: (i) end of the eccentric phase, (ii) adding maximum centre of mass (CoM) power, (iii) adding minimum CoM power, (iv) adding minimum vGRF. In addition to the four registration conditions, the non-registered vGRF and concentric phase only were time-normalised and used in subsequent analysis. Analysis of characterising phases was performed to reduce the vGRF data to features that captured the behaviour of each waveform. These features were extracted from each condition’s vGRF waveform, time-domain (time taken to complete the movement), and warping functions (generated from landmark registration). The identified features were used as predictor features to fit a step-wise multilinear regression to jump height. Features generated from the best performing registration condition were able to predict jump height to a similar extent as the concentric phase (86–87%), while all registration conditions could explain jump height to a greater extent than time-normalisation alone (65%). This suggests waveform variability was reduced as phases of the CMJ were aligned. However, findings suggest that over-registration can occur when applying landmark registration. Overall, landmark registration can improve prediction power to performance measures as waveform data can be reduced to more appropriate performance related features.  相似文献   

14.
From measures on a battery of fitness tests in elite-standard squash players on different tiers of a national performance program, we examined the relationships among test scores and player rank, and fitness factors important for squash-specific multiple-sprint ability. Thirty-one (20 men, 11 women) squash players from the England Squash performance program participated: n = 12 senior; n = 7 transition; n = 12 talented athlete scholarship scheme (TASS) players. In 1 test session and in a fixed order, the players completed a battery of tests to assess countermovement jump height, reactive strength, change-of-direction speed, and multiple-sprint ability on squash-specific tests and endurance fitness. Two-way analysis of variance compared senior, transition, and TASS players by sex on all measures except jump height where only senior and transition players were compared. Effect size (ES) was calculated for all comparisons. Pearson's correlation examined relationships among test scores and multiple-sprint ability. Spearman's ρ investigated relationships among test scores and players' rank in men and women separately. Regardless of sex, seniors outperformed TASS players on all except the endurance test (p < 0.05, ES at least 1.1). Seniors had better multiple-sprint ability than did transition players (p < 0.01, ES = 1.2). Transition outperformed TASS players on the reactive-strength test (p < 0.05, ES = 1.0). Men outperformed women in all tests at all performance program tiers (p < 0.05, ES at least 0.5). In men, rank was related to multiple-sprint ability, fastest-multiple-sprint-test repetition, and change-of-direction speed (ρ = 0.78, 0.86, 0.59, respectively). In women, rank was related to fastest multiple-sprint-test repetition (ρ = 0.65). In men and women, multiple-sprint ability was related to change-of-direction speed (r = 0.9 and 0.84) and fastest-multiple-sprint-test repetition (r = 0.96 for both) and to reactive strength in men (r = -0.71). The results confirm that high-intensity variable-direction exercise capabilities are important for success in elite squash.  相似文献   

15.
16.
The ability to generate lower body explosive power is considered an important factor in many athletic activities. Thirty-one men and women, recreationally trained volunteers, were randomly assigned to 3 different groups (control, n = 10; VertiMax, n = 11; and depth jump, n = 10). A Vertec measuring device was used to test vertical jump height pre- and post-training. All subjects trained twice weekly for 6 weeks, performing approximately 140 jumps. The VertiMax group increased elastic resistance and decreased volume each week, while the depth jump group increased both box height and volume each week. The depth jump group significantly increased their vertical jump height (pre: 20.5 +/- 3.98; post: 22.65 +/- 4.09), while the VertiMax (pre: 22.18 +/- 4.31; post: 23.36 +/- 4.06) and control groups (pre: 15.65 +/- 4.51; post: 15.85 +/- 4.17) did not change. These findings suggest that, within the volume and intensity constraints of this study, depth jump training twice weekly for 6 weeks is more beneficial than VertiMax jump training for increasing vertical jump height. Strength professionals should focus on depth jump exercises in the short term over commercially available devices to improve vertical jump performance.  相似文献   

17.
The aim of this study was to assess and compare the ability of discrete point analysis (DPA), functional principal component analysis (fPCA) and analysis of characterizing phases (ACP) to describe a dependent variable (jump height) using vertical ground reaction force curves captured during the propulsion phase of a countermovement jump. FPCA and ACP are continuous data analysis techniques that reduce the dimensionality of a data set by identifying phases of variation (key phases), which are used to generate subject scores that describe a subject?s behavior. A stepwise multiple regression analysis was used to measure the ability to describe jump height of each data analysis technique. Findings indicated that the order of effectiveness (high to low) across the examined techniques was: ACP (99%), fPCA (78%) and DPA (21%). DPA was outperformed by fPCA and ACP because it can inadvertently compare unrelated features, does not analyze the whole data set and cannot examine important features that occur solely as a phase. ACP outperformed fPCA because it utilizes information within the combined magnitude-time domain, and identifies and examines key phases separately without the deleterious interaction of other key phases.  相似文献   

18.
Synopsis Data matrices of fish stomach contents frequently contain many zeros, and nonzero values often do not follow usually encountered statistical distributions. Therefore, many common methods of statistical analysis are inappropriate for such data. A method of repeated k-means cluster analysis is proposed for exploratory analysis of data sets on fish stomach contents. Objective rules are proposed for setting the clustering parameters, so the arbitrariness and subjectivity common in interpreting hierarchical clustering methods is avoided. Because the clusters are nonhierarchical, the analysis method also requires much less computer time and memory. Application of the method is illustrated with a data set of 1771 stomachs of cod (Gadus morhua), feeding on 38 different prey types. The results of the clusterings reveal that nine types of prey may account for the systematic information about the diet of cod in this sample from the northern Grand Bank in Spring of 1979. The results are also used to test specific hypotheses about size selectivity of the predator, spatial variation of feeding, environmental influences on diet, and relative preferences among prey taxa.  相似文献   

19.
MOTIVATION: Clustering technique is used to find groups of genes that show similar expression patterns under multiple experimental conditions. Nonetheless, the results obtained by cluster analysis are influenced by the existence of missing values that commonly arise in microarray experiments. Because a clustering method requires a complete data matrix as an input, previous studies have estimated the missing values using an imputation method in the preprocessing step of clustering. However, a common limitation of these conventional approaches is that once the estimates of missing values are fixed in the preprocessing step, they are not changed during subsequent processes of clustering; badly estimated missing values obtained in data preprocessing are likely to deteriorate the quality and reliability of clustering results. Thus, a new clustering method is required for improving missing values during iterative clustering process. RESULTS: We present a method for Clustering Incomplete data using Alternating Optimization (CIAO) in which a prior imputation method is not required. To reduce the influence of imputation in preprocessing, we take an alternative optimization approach to find better estimates during iterative clustering process. This method improves the estimates of missing values by exploiting the cluster information such as cluster centroids and all available non-missing values in each iteration. To test the performance of the CIAO, we applied the CIAO and conventional imputation-based clustering methods, e.g. k-means based on KNNimpute, for clustering two yeast incomplete data sets, and compared the clustering result of each method using the Saccharomyces Genome Database annotations. The clustering results of the CIAO method are more significantly relevant to the biological gene annotations than those of other methods, indicating its effectiveness and potential for clustering incomplete gene expression data. AVAILABILITY: The software was developed using Java language, and can be executed on the platforms that JVM (Java Virtual Machine) is running. It is available from the authors upon request.  相似文献   

20.

Background

Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotation resources accumulated, an increasing number of genes have been annotated into functional categories. As a result, evaluating the performance of clustering methods in terms of the functional consistency of the resulting clusters is of great interest.

Results

In this paper, we proposed the WDCM (Weibull Distribution-based Clustering Method), a robust approach for clustering gene expression data, in which the gene expressions of individual genes are considered as the random variables following unique Weibull distributions. Our WDCM is based on the concept that the genes with similar expression profiles have similar distribution parameters, and thus the genes are clustered via the Weibull distribution parameters. We used the WDCM to cluster three cancer gene expression data sets from the lung cancer, B-cell follicular lymphoma and bladder carcinoma and obtained well-clustered results. We compared the performance of WDCM with k-means and Self Organizing Map (SOM) using functional annotation information given by the Gene Ontology (GO). The results showed that the functional annotation ratios of WDCM are higher than those of the other methods. We also utilized the external measure Adjusted Rand Index to validate the performance of the WDCM. The comparative results demonstrate that the WDCM provides the better clustering performance compared to k-means and SOM algorithms. The merit of the proposed WDCM is that it can be applied to cluster incomplete gene expression data without imputing the missing values. Moreover, the robustness of WDCM is also evaluated on the incomplete data sets.

Conclusions

The results demonstrate that our WDCM produces clusters with more consistent functional annotations than the other methods. The WDCM is also verified to be robust and is capable of clustering gene expression data containing a small quantity of missing values.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号