期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

欧阳玉梅《生物信息学》2010,8(2):104-109

随着DNA芯片技术的广泛应用,基因表达数据分析已成为生命科学的研究热点之一。概述基因表达聚类技术类型、算法分类与特点、结果可视化与注释;阐述一些流行的和新型的算法;介绍17个最新相关软件包和在线web服务工具;并说明软件工具的研究趋向。相似文献

2.

Hybrid hierarchical clustering with applications to microarray data

Chipman H Tibshirani R 《Biostatistics (Oxford, England)》2006,7(2):286-301

In this paper, we propose a hybrid clustering method that combines the strengths of bottom-up hierarchical clustering with that of top-down clustering. The first method is good at identifying small clusters but not large ones; the strengths are reversed for the second method. The hybrid method is built on the new idea of a mutual cluster: a group of points closer to each other than to any other points. Theoretical connections between mutual clusters and bottom-up clustering methods are established, aiding in their interpretation and providing an algorithm for identification of mutual clusters. We illustrate the technique on simulated and real microarray datasets. 相似文献

3.

Clustering approaches to identifying gene expression patterns from DNA microarray data

Do JH Choi DK 《Molecules and cells》2008,25(2):279-288

相似文献

4.

Mfuzz: a software package for soft clustering of microarray data 总被引：1，自引：0，他引：1

Kumar L E Futschik M 《Bioinformation》2007,2(1):5-7

For the analysis of microarray data, clustering techniques are frequently used. Most of such methods are based on hard clustering of data wherein one gene (or sample) is assigned to exactly one cluster. Hard clustering, however, suffers from several drawbacks such as sensitivity to noise and information loss. In contrast, soft clustering methods can assign a gene to several clusters. They can overcome shortcomings of conventional hard clustering techniques and offer further advantages. Thus, we constructed an R package termed Mfuzz implementing soft clustering tools for microarray data analysis. The additional package Mfuzzgui provides a convenient TclTk based graphical user interface. AVAILABILITY: The R package Mfuzz and Mfuzzgui are available at http://itb1.biologie.hu-berlin.de/~futschik/software/R/Mfuzz/index.html. Their distribution is subject to GPL version 2 license. 相似文献

5.

Effective feature selection framework for cluster analysis of microarray data

Pok G Liu JC Ryu KH 《Bioinformation》2010,4(8):385-389

The microarray technique has become a standard means in simultaneously examining expression of all genes measured in different circumstances. As microarray data are typically characterized by high dimensional features with a small number of samples, feature selection needs to be incorporated to identify a subset of genes that are meaningful for biological interpretation and accountable for the sample variation. In this article, we present a simple, yet effective feature selection framework suitable for two-dimensional microarray data. Our correlation-based, nonparametric approach allows compact representation of class-specific properties with a small number of genes. We evaluated our method using publicly available experimental data and obtained favorable results. 相似文献

6.

Interactive visualization and exploration of relationships between biological objects 总被引：2，自引：0，他引：2

Gilbert DR Schroeder M van Helden J 《Trends in biotechnology》2000,18(12):179-494

Genome sequencing and microarray technology produce ever-increasing amounts of complex data that need analysis. Visualization is an effective analytical technique that exploits the ability of the human brain to process large amounts of data. Here, we review traditional visualization methods based on clustering and tree representation, and also describe an alternative approach that involves projecting objects onto a Euclidean space in a way that reflects their structural or functional distances. Data are visualized without preclustering and can be dynamically explored by the user using ‘virtual-reality’. We illustrate this approach with two case studies from protein topology and gene expression. 相似文献

7.

Consensus miRNA expression profiles derived from interplatform normalization of microarray data

Rhishikesh Bargaje Manoj Hariharan Vinod Scaria Beena Pillai 《RNA (New York, N.Y.)》2010,16(1):16-25

相似文献

8.

Random forest for gene selection and microarray data classification

Moorthy K Mohamad MS 《Bioinformation》2011,7(3):142-146

A random forest method has been selected to perform both gene selection and classification of the microarray data. In this embedded method, the selection of smallest possible sets of genes with lowest error rates is the key factor in achieving highest classification accuracy. Hence, improved gene selection method using random forest has been proposed to obtain the smallest subset of genes as well as biggest subset of genes prior to classification. The option for biggest subset selection is done to assist researchers who intend to use the informative genes for further research. Enhanced random forest gene selection has performed better in terms of selecting the smallest subset as well as biggest subset of informative genes with lowest out of bag error rates through gene selection. Furthermore, the classification performed on the selected subset of genes using random forest has lead to lower prediction error rates compared to existing method and other similar available methods. 相似文献

9.

从microarray时序表达数据识别周期表达基因

周到何东周艳红《生物信息学》2008,6(2):68-70

根据周期表达基因的周期性和峰值特点,提出了一种将microarray时序表达数据划分为若干个基因表达周期,并对周期内的峰值特点进行评估以识别周期表达基因的方法,能有效减小microarray实验时的噪声干扰。选取了三组广泛使用的时序表达数据和一组可靠的周期表达基因集合对该方法的效果进行了测试,并与三种典型的周期表达基因识别方法的效果进行了比较。该方法能有效地从各种microarray时序表达数据中识别周期表达基因。相似文献

10.

Unsupervised pattern recognition: an introduction to the whys and wherefores of clustering microarray data

Boutros PC Okey AB 《Briefings in bioinformatics》2005,6(4):331-343

Clustering has become an integral part of microarray data analysis and interpretation. The algorithmic basis of clustering -- the application of unsupervised machine-learning techniques to identify the patterns inherent in a data set -- is well established. This review discusses the biological motivations for and applications of these techniques to integrating gene expression data with other biological information, such as functional annotation, promoter data and proteomic data. 相似文献

11.

Integrating functional knowledge during sample clustering for microarray data using unsupervised decision trees

Redestig H Repsilber D Sohler F Selbig J 《Biometrical journal. Biometrische Zeitschrift》2007,49(2):214-229

Clustering of microarray gene expression data is performed routinely, for genes as well as for samples. Clustering of genes can exhibit functional relationships between genes; clustering of samples on the other hand is important for finding e.g. disease subtypes, relevant patient groups for stratification or related treatments. Usually this is done by first filtering the genes for high-variance under the assumption that they carry most of the information needed for separating different sample groups. If this assumption is violated, important groupings in the data might be lost. Furthermore, classical clustering methods do not facilitate the biological interpretation of the results. Therefore, we propose to methodologically integrate the clustering algorithm with prior biological information. This is different from other approaches as knowledge about classes of genes can be directly used to ease the interpretation of the results and possibly boost clustering performance. Our approach computes dendrograms that resemble decision trees with gene classes used to split the data at each node which can help to find biologically meaningful differences between the sample groups. We have tested the proposed method both on simulated and real data and conclude its usefulness as a complementary method, especially when assumptions of few differentially expressed genes along with an informative mapping of genes to different classes are met. 相似文献

12.

There is no silver bullet--a guide to low-level data transforms and normalisation methods for microarray data

Kreil DP Russell RR 《Briefings in bioinformatics》2005,6(1):86-97

To overcome random experimental variation, even for simple screens, data from multiple microarrays have to be combined. There are, however, systematic differences between arrays, and any bias remaining after experimental measures to ensure consistency needs to be controlled for. It is often difficult to make the right choice of data transformation and normalisation methods to achieve this end. In this tutorial paper we review the problem and a selection of solutions, explaining the basic principles behind normalisation procedures and providing guidance for their application. 相似文献

13.

Bioinformatics and data mining in proteomics

《Expert review of proteomics》2013,10(3):333-343

Proteomic studies involve the identification as well as qualitative and quantitative comparison of proteins expressed under different conditions, and elucidation of their properties and functions, usually in a large-scale, high-throughput format. The high dimensionality of data generated from these studies will require the development of improved bioinformatics tools and data-mining approaches for efficient and accurate data analysis of biological specimens from healthy and diseased individuals. Mining large proteomics data sets provides a better understanding of the complexities between the normal and abnormal cell proteome of various biological systems, including environmental hazards, infectious agents (bioterrorism) and cancers. This review will shed light on recent developments in bioinformatics and data-mining approaches, and their limitations when applied to proteomics data sets, in order to strengthen the interdependence between proteomic technologies and bioinformatics tools. 相似文献

14.

Computing gene expression data with a knowledge-based gene clustering approach

Bruce A. Rosa Sookyung Oh Beronda L. Montgomery Jin Chen Wensheng Qin 《International Journal of Biochemistry and Molecular Biology》2010,1(1):51-68

Computational analysis methods for gene expression data gathered in microarray experiments can be used to identify the functions of previously unstudied genes. While obtaining the expression data is not a difficult task, interpreting and extracting the information from the datasets is challenging. In this study, a knowledge-based approach which identifies and saves important functional genes before filtering based on variability and fold change differences was utilized to study light regulation. Two clustering methods were used to cluster the filtered datasets, and clusters containing a key light regulatory gene were located. The common genes to both of these clusters were identified, and the genes in the common cluster were ranked based on their coexpression to the key gene. This process was repeated for 11 key genes in 3 treatment combinations. The initial filtering method reduced the dataset size from 22,814 probes to an average of 1134 genes, and the resulting common cluster lists contained an average of only 14 genes. These common cluster lists scored higher gene enrichment scores than two individual clustering methods. In addition, the filtering method increased the proportion of light responsive genes in the dataset from 1.8% to 15.2%, and the cluster lists increased this proportion to 18.4%. The relatively short length of these common cluster lists compared to gene groups generated through typical clustering methods or coexpression networks narrows the search for novel functional genes while increasing the likelihood that they are biologically relevant. 相似文献

15.

Multi-class clustering and prediction in the analysis of microarray data 总被引：1，自引：0，他引：1

Tsai CA Lee TC Ho IC Yang UC Chen CH Chen JJ 《Mathematical biosciences》2005,193(1):79-100

DNA microarray technology provides tools for studying the expression profiles of a large number of distinct genes simultaneously. This technology has been applied to sample clustering and sample prediction. Because of a large number of genes measured, many of the genes in the original data set are irrelevant to the analysis. Selection of discriminatory genes is critical to the accuracy of clustering and prediction. This paper considers statistical significance testing approach to selecting discriminatory gene sets for multi-class clustering and prediction of experimental samples. A toxicogenomic data set with nine treatments (a control and eight metals, As, Cd, Ni, Cr, Sb, Pb, Cu, and AsV with a total of 55 samples) is used to illustrate a general framework of the approach. Among four selected gene sets, a gene set omega(I) formed by the intersection of the F-test and the set of the union of one-versus-all t-tests performs the best in terms of clustering as well as prediction. Hierarchical and two modified partition (k-means) methods all show that the set omega(I) is able to group the 55 samples into seven clusters reasonably well, in which the As and AsV samples are considered as one cluster (the same group) as are the Cd and Cu samples. With respect to prediction, the overall accuracy for the gene set omega(I) using the nearest neighbors algorithm to predict 55 samples into one of the nine treatments is 85%. 相似文献

16.

Computational approaches to the integration of gene expression, ChIP-chip and sequence data in the inference of gene regulatory networks 总被引：1，自引：0，他引：1

Emma J. Cooke Richard S. Savage David L. Wild 《Seminars in cell & developmental biology》2009,20(7):863-868

相似文献

17.

Interpretable gene expression classifier with an accurate and compact fuzzy rule base for microarray data analysis

Ho SY Hsieh CH Chen HM Huang HL 《Bio Systems》2006,85(3):165-176

An accurate classifier with linguistic interpretability using a small number of relevant genes is beneficial to microarray data analysis and development of inexpensive diagnostic tests. Several frequently used techniques for designing classifiers of microarray data, such as support vector machine, neural networks, k-nearest neighbor, and logistic regression model, suffer from low interpretabilities. This paper proposes an interpretable gene expression classifier (named iGEC) with an accurate and compact fuzzy rule base for microarray data analysis. The design of iGEC has three objectives to be simultaneously optimized: maximal classification accuracy, minimal number of rules, and minimal number of used genes. An "intelligent" genetic algorithm IGA is used to efficiently solve the design problem with a large number of tuning parameters. The performance of iGEC is evaluated using eight commonly-used data sets. It is shown that iGEC has an accurate, concise, and interpretable rule base (1.1 rules per class) on average in terms of test classification accuracy (87.9%), rule number (3.9), and used gene number (5.0). Moreover, iGEC not only has better performance than the existing fuzzy rule-based classifier in terms of the above-mentioned objectives, but also is more accurate than some existing non-rule-based classifiers. 相似文献

18.

A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data

Zhao H Liew AW Xie X Yan H 《Journal of theoretical biology》2008,251(2):264-274

Biclustering is an important tool in microarray analysis when only a subset of genes co-regulates in a subset of conditions. Different from standard clustering analyses, biclustering performs simultaneous classification in both gene and condition directions in a microarray data matrix. However, the biclustering problem is inherently intractable and computationally complex. In this paper, we present a new biclustering algorithm based on the geometrical viewpoint of coherent gene expression profiles. In this method, we perform pattern identification based on the Hough transform in a column-pair space. The algorithm is especially suitable for the biclustering analysis of large-scale microarray data. Our studies show that the approach can discover significant biclusters with respect to the increased noise level and regulatory complexity. Furthermore, we also test the ability of our method to locate biologically verifiable biclusters within an annotated set of genes. 相似文献

19.

Genetic algorithms applied to multi-class clustering for gene expression data 总被引：2，自引：0，他引：2

Pan H Zhu J Han D 《基因组蛋白质组与生物信息学报(英文版)》2003,1(4):279-287

A hybrid GA (genetic algorithm)-based clustering (HGACLUS) schema, combining merits of the Simulated Annealing, was described for finding an optimal or near-optimal set of medoids. This schema maximized the clustering success by achieving internal cluster cohesion and external cluster isolation. The performance 相似文献

20.

From sequencing data to gene functions: co-functional network approaches

Jung Eun Shim Tak Lee 《Animal cells and systems.》2017,21(2):77-83

相似文献