首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到5条相似文献,搜索用时 0 毫秒
1.
With the advent of the microarray technology, the field of life science has been greatly revolutionized, since this technique allows the simultaneous monitoring of the expression levels of thousands of genes in a particular organism. However, the statistical analysis of expression data has its own challenges, primarily because of the huge amount of data that is to be dealt with, and also because of the presence of noise, which is almost an inherent characteristic of microarray data. Clustering is one tool used to mine meaningful patterns from microarray data. In this paper, we present a novel method of clustering yeast microarray data, which is robust and yet simple to implement. It identifies the best clusters from a given dataset on the basis of the population of the clusters as well as the variance of the feature values of the members from the cluster-center. It has been found to yield satisfactory results even in the presence of noisy data.  相似文献   

2.
Serban N  Jiang H 《Biometrics》2012,68(3):805-814
Summary In this article, we investigate clustering methods for multilevel functional data, which consist of repeated random functions observed for a large number of units (e.g., genes) at multiple subunits (e.g., bacteria types). To describe the within- and between variability induced by the hierarchical structure in the data, we take a multilevel functional principal component analysis (MFPCA) approach. We develop and compare a hard clustering method applied to the scores derived from the MFPCA and a soft clustering method using an MFPCA decomposition. In a simulation study, we assess the estimation accuracy of the clustering membership and the cluster patterns under a series of settings: small versus moderate number of time points; various noise levels; and varying number of subunits per unit. We demonstrate the applicability of the clustering analysis to a real data set consisting of expression profiles from genes activated by immunity system cells. Prevalent response patterns are identified by clustering the expression profiles using our multilevel clustering analysis.  相似文献   

3.
Biclustering is an important tool in microarray analysis when only a subset of genes co-regulates in a subset of conditions. Different from standard clustering analyses, biclustering performs simultaneous classification in both gene and condition directions in a microarray data matrix. However, the biclustering problem is inherently intractable and computationally complex. In this paper, we present a new biclustering algorithm based on the geometrical viewpoint of coherent gene expression profiles. In this method, we perform pattern identification based on the Hough transform in a column-pair space. The algorithm is especially suitable for the biclustering analysis of large-scale microarray data. Our studies show that the approach can discover significant biclusters with respect to the increased noise level and regulatory complexity. Furthermore, we also test the ability of our method to locate biologically verifiable biclusters within an annotated set of genes.  相似文献   

4.
Ho SY  Hsieh CH  Chen HM  Huang HL 《Bio Systems》2006,85(3):165-176
An accurate classifier with linguistic interpretability using a small number of relevant genes is beneficial to microarray data analysis and development of inexpensive diagnostic tests. Several frequently used techniques for designing classifiers of microarray data, such as support vector machine, neural networks, k-nearest neighbor, and logistic regression model, suffer from low interpretabilities. This paper proposes an interpretable gene expression classifier (named iGEC) with an accurate and compact fuzzy rule base for microarray data analysis. The design of iGEC has three objectives to be simultaneously optimized: maximal classification accuracy, minimal number of rules, and minimal number of used genes. An "intelligent" genetic algorithm IGA is used to efficiently solve the design problem with a large number of tuning parameters. The performance of iGEC is evaluated using eight commonly-used data sets. It is shown that iGEC has an accurate, concise, and interpretable rule base (1.1 rules per class) on average in terms of test classification accuracy (87.9%), rule number (3.9), and used gene number (5.0). Moreover, iGEC not only has better performance than the existing fuzzy rule-based classifier in terms of the above-mentioned objectives, but also is more accurate than some existing non-rule-based classifiers.  相似文献   

5.
Pairwise curve synchronization for functional data   总被引:1,自引:0,他引:1  
Tang  Rong; Muller  Hans-Georg 《Biometrika》2008,95(4):875-889
Data collected by scientists are increasingly in the form oftrajectories or curves. Often these can be viewed as realizationsof a composite process driven by both amplitude and time variation.We consider the situation in which functional variation is dominatedby time variation, and develop a curve-synchronization methodthat uses every trajectory in the sample as a reference to obtainpairwise warping functions in the first step. These initialpairwise warping functions are then used to create improvedestimators of the underlying individual warping functions inthe second step. A truncated averaging process is used to obtainrobust estimation of individual warping functions. The methodcompares well with other available time-synchronization approachesand is illustrated with Berkeley growth data and gene expressiondata for multiple sclerosis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号