期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

GenClust: A genetic algorithm for clustering gene expression data

Vito?Di Gesú Raffaele?Giancarlo Email author Giosué?Lo Bosco Alessandra?Raimondi Davide?Scaturro 《BMC bioinformatics》2005,6(1):289

Background

Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. 相似文献

2.

Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

Curtis Huttenhower Avi I Flamholz Jessica N Landis Sauhard Sahi Chad L Myers Kellen L Olszewski Matthew A Hibbs Nathan O Siemers Olga G Troyanskaya Hilary A Coller 《BMC bioinformatics》2007,8(1):250

Background

The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). 相似文献

3.

Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes

Ujjwal Maulik Anirban Mukhopadhyay Sanghamitra Bandyopadhyay 《BMC bioinformatics》2009,10(1):27

相似文献

4.

Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes

Pierre R Bushel Russell D Wolfinger Greg Gibson 《BMC systems biology》2007,1(1):15-20

Background

Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology. 相似文献

5.

Reuse of imputed data in microarray analysis increases imputation efficiency

Ki-Yeol Kim Byoung-Jin Kim Gwan-Su Yi 《BMC bioinformatics》2004,5(1):160

Background

The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the methods was low and the validity of imputed values in these methods had not been fully checked. 相似文献

6.

ArrayMining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization

Enrico Glaab Jonathan M Garibaldi Natalio Krasnogor 《BMC bioinformatics》2009,10(1):358

Background

Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. 相似文献

7.

A multiple kernel density clustering algorithm for incomplete datasets in bioinformatics

Longlong?Liao Kenli?Li Email author Keqin?Li Canqun?Yang Qi?Tian 《BMC systems biology》2018,12(6):111

Background

While there are a large number of bioinformatics datasets for clustering, many of them are incomplete, i.e., missing attribute values in some data samples needed by clustering algorithms. A variety of clustering algorithms have been proposed in the past years, but they usually are limited to cluster on the complete dataset. Besides, conventional clustering algorithms cannot obtain a trade-off between accuracy and efficiency of the clustering process since many essential parameters are determined by the human user’s experience.

Results

The paper proposes a Multiple Kernel Density Clustering algorithm for Incomplete datasets called MKDCI. The MKDCI algorithm consists of recovering missing attribute values of input data samples, learning an optimally combined kernel for clustering the input dataset, reducing dimensionality with the optimal kernel based on multiple basis kernels, detecting cluster centroids with the Isolation Forests method, assigning clusters with arbitrary shape and visualizing the results.

Conclusions

Extensive experiments on several well-known clustering datasets in bioinformatics field demonstrate the effectiveness of the proposed MKDCI algorithm. Compared with existing density clustering algorithms and parameter-free clustering algorithms, the proposed MKDCI algorithm tends to automatically produce clusters of better quality on the incomplete dataset in bioinformatics.

相似文献

8.

FLAME,a novel fuzzy clustering method for the analysis of DNA microarray data 总被引：3，自引：0，他引：3

Limin Fu Enzo Medico 《BMC bioinformatics》2007,8(1):3

Background

Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process. 相似文献

9.

Probe set filtering increases correlation between Affymetrix GeneChip and qRT-PCR expression measurements

Jakub Mieczkowski Magdalena E Tyburczy Michal Dabrowski Piotr Pokarowski 《BMC bioinformatics》2010,11(1):104

相似文献

10.

Discovering biclusters in gene expression data based on high-dimensional linear geometries

Xiangchao Gan Alan Wee-Chung Liew Hong Yan 《BMC bioinformatics》2008,9(1):209

相似文献

11.

Using expression arrays for copy number detection: an example from E. coli

Dmitriy Skvortsov Diana Abdueva Michael E Stitzer Steven E Finkel Simon Tavaré 《BMC bioinformatics》2007,8(1):203

Background

The sequencing of many genomes and tiling arrays consisting of millions of DNA segments spanning entire genomes have made high-resolution copy number analysis possible. Microarray-based comparative genomic hybridization (array CGH) has enabled the high-resolution detection of DNA copy number aberrations. While many of the methods and algorithms developed for the analysis microarrays have focused on expression analysis, the same technology can be used to detect genetic alterations, using for example standard commercial Affymetrix arrays. Due to the nature of the resultant data, standard techniques for processing GeneChip expression experiments are inapplicable. 相似文献

12.

Incremental genetic K-means algorithm and its application in gene expression data analysis 总被引：1，自引：0，他引：1

Yi?Lu Shiyong?Lu Farshad?Fotouhi Youping?Deng Email author Susan?J?Brown 《BMC bioinformatics》2004,5(1):172

Background

In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partitioned into groups based on the similarity between their expression profiles. In this way, functionally related genes are identified. As the amount of laboratory data in molecular biology grows exponentially each year due to advanced technologies such as Microarray, new efficient and effective methods for clustering must be developed to process this growing amount of biological data. 相似文献

13.

Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies

Peter A DiMaggioJr Scott R McAllister Christodoulos A Floudas Xiao-Jiang Feng Joshua D Rabinowitz Herschel A Rabitz 《BMC bioinformatics》2008,9(1):458

Background

The analysis of large-scale data sets via clustering techniques is utilized in a number of applications. Biclustering in particular has emerged as an important problem in the analysis of gene expression data since genes may only jointly respond over a subset of conditions. Biclustering algorithms also have important applications in sample classification where, for instance, tissue samples can be classified as cancerous or normal. Many of the methods for biclustering, and clustering algorithms in general, utilize simplified models or heuristic strategies for identifying the "best" grouping of elements according to some metric and cluster definition and thus result in suboptimal clusters. 相似文献

14.

Misty Mountain clustering: application to fast unsupervised flow cytometry gating

István P Sugár Stuart C Sealfon 《BMC bioinformatics》2010,11(1):502

Background

There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local minima or cluster shape bias. Model-based approaches are restricted by the assumptions of the fitting functions. Furthermore, model based clustering requires serial clustering for all cluster numbers within a user defined interval. The final cluster number is then selected by various criteria. These supervised serial clustering methods are time consuming and frequently different criteria result in different optimal cluster numbers. Various unsupervised heuristic approaches that have been developed such as affinity propagation are too expensive to be applied to datasets on the order of 10⁶ points that are often generated by high throughput experiments. 相似文献

15.

Dissecting systems-wide data using mixture models: application to identify affected cellular processes

J?Peter?Svensson Renée?X?de Menezes Ingela?Turesson Micheline?Giphart-Gassler Harry?Vrieling Email author 《BMC bioinformatics》2005,6(1):177

Background

Functional analysis of data from genome-scale experiments, such as microarrays, requires an extensive selection of differentially expressed genes. Under many conditions, the proportion of differentially expressed genes is considerable, making the selection criteria a balance between the inclusion of false positives and the exclusion of false negatives. 相似文献

16.

Microarray data mining using landmark gene-guided clustering

Pankaj Chopra Jaewoo Kang Jiong Yang HyungJun Cho Heenam Stanley Kim Min-Goo Lee 《BMC bioinformatics》2008,9(1):92

Background

Clustering is a popular data exploration technique widely used in microarray data analysis. Most conventional clustering algorithms, however, generate only one set of clusters independent of the biological context of the analysis. This is often inadequate to explore data from different biological perspectives and gain new insights. We propose a new clustering model that can generate multiple versions of different clusters from a single dataset, each of which highlights a different aspect of the given dataset. 相似文献

17.

The IronChip evaluation package: a package of perl modules for robust analysis of custom microarrays

Yevhen Vainshtein Mayka Sanchez Alvis Brazma Matthias W Hentze Thomas Dandekar Martina U Muckenthaler 《BMC bioinformatics》2010,11(1):112

Background

Gene expression studies greatly contribute to our understanding of complex relationships in gene regulatory networks. However, the complexity of array design, production and manipulations are limiting factors, affecting data quality. The use of customized DNA microarrays improves overall data quality in many situations, however, only if for these specifically designed microarrays analysis tools are available. 相似文献

18.

CLUSS: Clustering of protein sequences based on a new similarity measure

Abdellali Kelil Shengrui Wang Ryszard Brzezinski Alain Fleury 《BMC bioinformatics》2007,8(1):286

Background

The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "phylogenetic" in the sense of "relatedness of biological functions". 相似文献

19.

DNA microarray data and contextual analysis of correlation graphs

Jacques?Rougemont Email author Pascal?Hingamp 《BMC bioinformatics》2003,4(1):15

Background

DNA microarrays are used to produce large sets of expression measurements from which specific biological information is sought. Their analysis requires efficient and reliable algorithms for dimensional reduction, classification and annotation. 相似文献

20.

<Emphasis Type="Italic">EXPANDER</Emphasis> – an integrative program suite for microarray data analysis

Ron?Shamir Email author Adi?Maron-Katz Amos?Tanay Chaim?Linhart Israel?Steinfeld Roded?Sharan Yosef?Shiloh Ran?Elkon 《BMC bioinformatics》2005,6(1):232

相似文献