期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Importance of data structure in comparing two dimension reduction methods for classification of microarray gene expression data

Caroline Truntzer Catherine Mercier Jacques Estève Christian Gautier Pascal Roy 《BMC bioinformatics》2007,8(1):90

Background

With the advance of microarray technology, several methods for gene classification and prognosis have been already designed. However, under various denominations, some of these methods have similar approaches. This study evaluates the influence of gene expression variance structure on the performance of methods that describe the relationship between gene expression levels and a given phenotype through projection of data onto discriminant axes. 相似文献

2.

A general modular framework for gene set enrichment analysis

Marit Ackermann Korbinian Strimmer 《BMC bioinformatics》2009,10(1):47-20

Background

Analysis of microarray and other high-throughput data on the basis of gene sets, rather than individual genes, is becoming more important in genomic studies. Correspondingly, a large number of statistical approaches for detecting gene set enrichment have been proposed, but both the interrelations and the relative performance of the various methods are still very much unclear. 相似文献

3.

Gene selection and classification of microarray data using random forest 总被引：9，自引：0，他引：9

Ramón Díaz-Uriarte Sara Alvarez de Andrés 《BMC bioinformatics》2006,7(1):3-13

Background

Selection of relevant genes for sample classification is a common task in most gene expression studies, where researchers try to identify the smallest possible set of genes that can still achieve good predictive performance (for instance, for future use with diagnostic purposes in clinical practice). Many gene selection approaches use univariate (gene-by-gene) rankings of gene relevance and arbitrary thresholds to select the number of genes, can only be applied to two-class problems, and use gene selection ranking criteria unrelated to the classification algorithm. In contrast, random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of observations and in problems involving more than two classes, and returns measures of variable importance. Thus, it is important to understand the performance of random forest with microarray data and its possible use for gene selection. 相似文献

4.

Building multiclass classifiers for remote homology detection and fold recognition

Huzefa Rangwala George Karypis 《BMC bioinformatics》2006,7(1):455-16

Background

Protein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the more general multiclass remote homology prediction and fold recognition problems. 相似文献

5.

SignS: a parallelized,open-source,freely available,web-based tool for gene selection and molecular signatures for survival and censored data

Ramon Diaz-Uriarte 《BMC bioinformatics》2008,9(1):30

Background

Censored data are increasingly common in many microarray studies that attempt to relate gene expression to patient survival. Several new methods have been proposed in the last two years. Most of these methods, however, are not available to biomedical researchers, leading to many re-implementations from scratch of ad-hoc, and suboptimal, approaches with survival data. 相似文献

6.

Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data

Chia Huey Ooi Madhu Chetty Shyh Wei Teng 《BMC bioinformatics》2006,7(1):320

Background

Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. 相似文献

7.

Stability of gene contributions and identification of outliers in multivariate analysis of microarray data

Florent Baty Daniel Jaeger Frank Preiswerk Martin M Schumacher Martin H Brutsche 《BMC bioinformatics》2008,9(1):289

Background

Multivariate ordination methods are powerful tools for the exploration of complex data structures present in microarray data. These methods have several advantages compared to common gene-by-gene approaches. However, due to their exploratory nature, multivariate ordination methods do not allow direct statistical testing of the stability of genes. 相似文献

8.

Multiclass classification of microarray data samples with a reduced number of genes

Elizabeth Tapia Leonardo Ornella Pilar Bulacio Laura Angelone 《BMC bioinformatics》2011,12(1):59

Background

Multiclass classification of microarray data samples with a reduced number of genes is a rich and challenging problem in Bioinformatics research. The problem gets harder as the number of classes is increased. In addition, the performance of most classifiers is tightly linked to the effectiveness of mandatory gene selection methods. Critical to gene selection is the availability of estimates about the maximum number of genes that can be handled by any classification algorithm. Lack of such estimates may lead to either computationally demanding explorations of a search space with thousands of dimensions or classification models based on gene sets of unrestricted size. In the former case, unbiased but possibly overfitted classification models may arise. In the latter case, biased classification models unable to support statistically significant findings may be obtained. 相似文献

9.

Rank-invariant resampling based estimation of false discovery rate for analysis of small sample microarray data

Nitin?Jain HyungJun?Cho Michael?O'Connell Jae?K?Lee Email author 《BMC bioinformatics》2005,6(1):187

Background

The evaluation of statistical significance has become a critical process in identifying differentially expressed genes in microarray studies. Classical p-value adjustment methods for multiple comparisons such as family-wise error rate (FWER) have been found to be too conservative in analyzing large-screening microarray data, and the False Discovery Rate (FDR), the expected proportion of false positives among all positives, has been recently suggested as an alternative for controlling false positives. Several statistical approaches have been used to estimate and control FDR, but these may not provide reliable FDR estimation when applied to microarray data sets with a small number of replicates. 相似文献

10.

Comparing transformation methods for DNA microarray data

Helene?H?Thygesen Email author Aeilko?H?Zwinderman 《BMC bioinformatics》2004,5(1):77

Background

When DNA microarray data are used for gene clustering, genotype/phenotype correlation studies, or tissue classification the signal intensities are usually transformed and normalized in several steps in order to improve comparability and signal/noise ratio. These steps may include subtraction of an estimated background signal, subtracting the reference signal, smoothing (to account for nonlinear measurement effects), and more. Different authors use different approaches, and it is generally not clear to users which method they should prefer. 相似文献

11.

NOXclass: prediction of protein-protein interaction types

Hongbo Zhu Francisco S Domingues Ingolf Sommer Thomas Lengauer 《BMC bioinformatics》2006,7(1):27-15

Background

Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investigated previously and classification approaches have been proposed. However, less attention has been devoted to distinguishing different types of biological interactions. These interactions are classified as obligate and non-obligate according to the effect of the complex formation on the stability of the protomers. So far no automatic classification methods for distinguishing obligate, non-obligate and crystal packing interactions have been made available. 相似文献

12.

Classification and biomarker identification using gene network modules and support vector machines

Malik Yousef Mohamed Ketany Larry Manevitz Louise C Showe Michael K Showe 《BMC bioinformatics》2009,10(1):337

Background

Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes. 相似文献

13.

FLAME,a novel fuzzy clustering method for the analysis of DNA microarray data 总被引：3，自引：0，他引：3

Limin Fu Enzo Medico 《BMC bioinformatics》2007,8(1):3

Background

Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process. 相似文献

14.

Comparison of small n statistical tests of differential expression applied to microarrays

Carl Murie Owen Woody Anna Y Lee Robert Nadon 《BMC bioinformatics》2009,10(1):45-18

Background

DNA microarrays provide data for genome wide patterns of expression between observation classes. Microarray studies often have small samples sizes, however, due to cost constraints or specimen availability. This can lead to poor random error estimates and inaccurate statistical tests of differential expression. We compare the performance of the standard t-test, fold change, and four small n statistical test methods designed to circumvent these problems. We report results of various normalization methods for empirical microarray data and of various random error models for simulated data. 相似文献

15.

R/BHC: fast Bayesian hierarchical clustering for microarray data

Richard S Savage Katherine Heller Yang Xu Zoubin Ghahramani William M Truman Murray Grant Katherine J Denby David L Wild 《BMC bioinformatics》2009,10(1):242

Background

Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained. 相似文献

16.

The limit fold change model: A practical approach for selecting differentially expressed genes from microarray data

David M Mutch Alvin Berger Robert Mansourian Andreas Rytz Matthew-Alan Roberts 《BMC bioinformatics》2002,3(1):17-11

Background

The biomedical community is developing new methods of data analysis to more efficiently process the massive data sets produced by microarray experiments. Systematic and global mathematical approaches that can be readily applied to a large number of experimental designs become fundamental to correctly handle the otherwise overwhelming data sets. 相似文献

17.

SED, a normalization free method for DNA microarray data analysis

Huajun Wang Hui Huang 《BMC bioinformatics》2004,5(1):121

Background

Analysis of DNA microarray data usually begins with a normalization step where intensities of different arrays are adjusted to the same scale so that the intensity levels from different arrays can be compared with one other. Both simple total array intensity-based as well as more complex "local intensity level" dependent normalization methods have been developed, some of which are widely used. Much less developed methods for microarray data analysis include those that bypass the normalization step and therefore yield results that are not confounded by potential normalization errors. 相似文献

18.

Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data

Mingxing Yang Xiumin Li Zhibin Li Zhimin Ou Ming Liu Suhuan Liu Xuejun Li Shuyu Yang 《PloS one》2013,8(12)

Motivation

DNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignificant genes in a dataset, a feature selection approach must be employed in data analysis. The performance of cluster analysis of this high-throughput data depends on whether the feature selection approach chooses the most relevant genes associated with disease classes.

Results

Here we proposed a new method using multiple Orthogonal Partial Least Squares-Discriminant Analysis (mOPLS-DA) models and S-plots to select the most relevant genes to conduct three-class disease classification and prediction. We tested our method using Golub’s leukemia microarray data. For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes. For three classes in parallel, we employed three OPLS-DA models and S-plots to choose marker genes for each class. The power of feature selection to classify and predict three-class disease was evaluated using cluster analysis. Further, the general performance of our method was tested using four public datasets and compared with those of four other feature selection methods. The results revealed that our method effectively selected the most relevant features for disease classification and prediction, and its performance was better than that of the other methods. 相似文献

19.

PCA disjoint models for multiclass cancer analysis using gene expression data 总被引：4，自引：0，他引：4

Bicciato S Luchini A Di Bello C 《Bioinformatics (Oxford, England)》2003,19(5):571-578

MOTIVATION: Microarray expression profiling appears particularly promising for a deeper understanding of cancer biology and to identify molecular signatures supporting the histological classification schemes of neoplastic specimens. However, molecular diagnostics based on microarray data presents major challenges due to the overwhelming number of variables and the complex, multiclass nature of tumor samples. Thus, the development of marker selection methods, that allow the identification of those genes that are most likely to confer high classification accuracy of multiple tumor types, and of multiclass classification schemes is of paramount importance. RESULTS: A computational procedure for marker identification and for classification of multiclass gene expression data through the application of disjoint principal component models is described. The identified features represent a rational and dimensionally reduced base for understanding the basic biology of diseases, defining targets for therapeutic intervention, and developing diagnostic tools for the identification and classification of multiple pathological states. The method has been tested on different microarray data sets obtained from various human tumor samples. The results demonstrate that this procedure allows the identification of specific phenotype markers and can classify previously unseen instances in the presence of multiple classes. 相似文献

20.

A survey of motif discovery methods in an integrated framework

Geir Kjetil Sandve Finn Drabløs 《Biology direct》2006,1(1):11-16

Background

There has been a growing interest in computational discovery of regulatory elements, and a multitude of motif discovery methods have been proposed. Computational motif discovery has been used with some success in simple organisms like yeast. However, as we move to higher organisms with more complex genomes, more sensitive methods are needed. Several recent methods try to integrate additional sources of information, including microarray experiments (gene expression and ChlP-chip). There is also a growing awareness that regulatory elements work in combination, and that this combinatorial behavior must be modeled for successful motif discovery. However, the multitude of methods and approaches makes it difficult to get a good understanding of the current status of the field. 相似文献