共查询到20条相似文献,搜索用时 31 毫秒
1.
Benoît De Hertogh Bertrand De Meulder Fabrice Berger Michael Pierre Eric Bareke Anthoula Gaigneaux Eric Depiereux 《BMC bioinformatics》2010,11(1):17
Background
Recent reanalysis of spike-in datasets underscored the need for new and more accurate benchmark datasets for statistical microarray analysis. We present here a fresh method using biologically-relevant data to evaluate the performance of statistical methods. 相似文献2.
Background
The development of high-throughput technologies such as yeast two-hybrid systems and mass spectrometry technologies has made it possible to generate large protein-protein interaction (PPI) datasets. Mining these datasets for underlying biological knowledge has, however, remained a challenge. 相似文献3.
Qian Xiang Xianhua Dai Yangyang Deng Caisheng He Jiang Wang Jihua Feng Zhiming Dai 《BMC bioinformatics》2008,9(1):252
Background
It is an important pre-processing step to accurately estimate missing values in microarray data, because complete datasets are required in numerous expression profile analysis in bioinformatics. Although several methods have been suggested, their performances are not satisfactory for datasets with high missing percentages. 相似文献4.
Background
Although the prediction of protein-protein interactions has been extensively investigated for yeast, few such datasets exist for the far larger proteome in human. Furthermore, it has recently been estimated that the overall average false positive rate of available computational and high-throughput experimental interaction datasets is as high as 90%. 相似文献5.
Background
Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry. 相似文献6.
Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset 下载免费PDF全文
Sung E Choe Michael Boutros Alan M Michelson George M Church Marc S Halfon 《Genome biology》2004,6(2):R16
Background
As more methods are developed to analyze RNA-profiling data, assessing their performance using control datasets becomes increasingly important. 相似文献7.
Jie Zheng Jan T Svensson Kavitha Madishetty Timothy J Close Tao Jiang Stefano Lonardi 《BMC bioinformatics》2006,7(1):7
Background
Expressed sequence tag (EST) datasets represent perhaps the largest collection of genetic information. ESTs can be exploited in a variety of biological experiments and analysis. Here we are interested in the design of overlapping oligonucleotide (overgo) probes from large unigene (EST-contigs) datasets. 相似文献8.
Jianjun Hu Haifeng Li Michael S Waterman Xianghong Jasmine Zhou 《BMC bioinformatics》2006,7(1):449-14
Background
Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples. 相似文献9.
Background
The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools. 相似文献10.
Background
The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets. 相似文献11.
Jochen Supper Martin Strauch Dierk Wanke Klaus Harter Andreas Zell 《BMC bioinformatics》2007,8(1):334
Background
Cells dynamically adapt their gene expression patterns in response to various stimuli. This response is orchestrated into a number of gene expression modules consisting of co-regulated genes. A growing pool of publicly available microarray datasets allows the identification of modules by monitoring expression changes over time. These time-series datasets can be searched for gene expression modules by one of the many clustering methods published to date. For an integrative analysis, several time-series datasets can be joined into a three-dimensional gene-condition-time dataset, to which standard clustering or biclustering methods are, however, not applicable. We thus devise a probabilistic clustering algorithm for gene-condition-time datasets. 相似文献12.
13.
Background
Molecular experiments using multiplex strategies such as cDNA microarrays or proteomic approaches generate large datasets requiring biological interpretation. Text based data mining tools have recently been developed to query large biological datasets of this type of data. PubMatrix is a web-based tool that allows simple text based mining of the NCBI literature search service PubMed using any two lists of keywords terms, resulting in a frequency matrix of term co-occurrence. 相似文献14.
Background
Meta-analysis methods exist for combining multiple microarray datasets. However, there are a wide range of issues associated with microarray meta-analysis and a limited ability to compare the performance of different meta-analysis methods. 相似文献15.
Background
Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. 相似文献16.
Background
Recent biological discoveries have shown that clustering large datasets is essential for better understanding biology in many areas. Spectral clustering in particular has proven to be a powerful tool amenable for many applications. However, it cannot be directly applied to large datasets due to time and memory limitations. To address this issue, we have modified spectral clustering by adding an information preserving sampling procedure and applying a post-processing stage. We call this entire algorithm SamSPECTRAL. 相似文献17.
Background
Large-scale compilation of gene expression microarray datasets across diverse biological phenotypes provided a means of gathering a priori knowledge in the form of identification and annotation of bimodal genes in the human and mouse genomes. These switch-like genes consist of 15% of known human genes, and are enriched with genes coding for extracellular and membrane proteins. It is of interest to determine the prediction potential of bimodal genes for class discovery in large-scale datasets. 相似文献18.
Background
Recently, mass spectrometry data have been mined using a genetic algorithm to produce discriminatory models that distinguish healthy individuals from those with cancer. This algorithm is the basis for claims of 100% sensitivity and specificity in two related publicly available datasets. To date, no detailed attempts have been made to explore the properties of this genetic algorithm within proteomic applications. Here the algorithm's performance on these datasets is evaluated relative to other methods. 相似文献19.
Richard Baran Martin Robert Makoto Suematsu Tomoyoshi Soga Masaru Tomita 《BMC bioinformatics》2007,8(1):72
Background
Density plot visualizations (also referred to as heat maps or color maps) are widely used in different fields including large-scale omics studies in biological sciences. However, the current color-codings limit the visualizations to single datasets or pairwise comparisons. 相似文献20.
David D Smith Pål Sætrom Ola SnøveJr Cathryn Lundberg Guillermo E Rivas Carlotta Glackin Garrett P Larson 《BMC bioinformatics》2008,9(1):63