期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

CLOE: Identification of putative functional relationships among genes by comparison of expression profiles between two species

Maurizio?Pellegrino Paolo?Provero Lorenzo?Silengo Ferdinando?Di Cunto Email author 《BMC bioinformatics》2004,5(1):179

Background

Public repositories of microarray data contain an incredible amount of information that is potentially relevant to explore functional relationships among genes by meta-analysis of expression profiles. However, the widespread use of this resource by the scientific community is at the moment limited by the limited availability of effective tools of analysis. We here describe CLOE, a simple cDNA microarray data mining strategy based on meta-analysis of datasets from pairs of species. The method consists in ranking EST probes in the datasets of the two species according to the similarity of their expression profiles with that of two EST probes from orthologous genes, and extracting orthologous EST pairs from a given top interval of the ranked lists. The Gene Ontology annotation of the obtained candidate partners is then analyzed for keywords overrepresentation. 相似文献

2.

Cross-platform comparison and visualisation of gene expression data using co-inertia analysis

Aedín?C?Culhane Email author Guy?Perrière Desmond?G?Higgins 《BMC bioinformatics》2003,4(1):59

相似文献

3.

Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data

Ian B Jeffery Desmond G Higgins Aedín C Culhane 《BMC bioinformatics》2006,7(1):359-16

Background

Numerous feature selection methods have been applied to the identification of differentially expressed genes in microarray data. These include simple fold change, classical t-statistic and moderated t-statistics. Even though these methods return gene lists that are often dissimilar, few direct comparisons of these exist. We present an empirical study in which we compare some of the most commonly used feature selection methods. We apply these to 9 publicly available datasets, and compare, both the gene lists produced and how these perform in class prediction of test datasets. 相似文献

4.

Classification and biomarker identification using gene network modules and support vector machines

Malik Yousef Mohamed Ketany Larry Manevitz Louise C Showe Michael K Showe 《BMC bioinformatics》2009,10(1):337

Background

Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes. 相似文献

5.

Expression profiles of switch-like genes accurately classify tissue and infectious disease phenotypes in model-based classification

Michael Gormley Aydin Tozeren 《BMC bioinformatics》2008,9(1):486

Background

Large-scale compilation of gene expression microarray datasets across diverse biological phenotypes provided a means of gathering a priori knowledge in the form of identification and annotation of bimodal genes in the human and mouse genomes. These switch-like genes consist of 15% of known human genes, and are enriched with genes coding for extracellular and membrane proteins. It is of interest to determine the prediction potential of bimodal genes for class discovery in large-scale datasets. 相似文献

6.

Comprehensive analysis of forty yeast microarray datasets reveals a novel subset of genes (APha-RiB) consistently negatively associated with ribosome biogenesis

Basel Abu-Jamous Rui Fa David J Roberts Asoke K Nandi 《BMC bioinformatics》2014,15(1)

相似文献

7.

Average rank-based score to measure deregulation of molecular pathway gene sets

Yang H Cheng C Zhang W 《PloS one》2011,6(11):e27579

Background

Deregulation of biological pathways has been shown to be involved in the turmorigenesis of a variety of cancers. The co-regulation of pathways in tumor and normal tissues has not been studied in a systematic manner.

Results

In this study we propose a novel statistic named AR-score (average rank based score) to measure pathway activities based on microarray gene expression profiles. We calculate and compare the AR-scores of pathways in microarray datasets containing expression profiles for a wide range of cancer types as well as the corresponding normal tissues. We find that many pathways undergo significant activity changes in tumors with respect to normal tissues. AR-scores for a small subset of pathways are capable of distinguishing tumor from normal tissues or classifying tumor subtypes. In normal tissues many pathways are highly correlated in their activities, whereas their correlations reduce significantly in tumors and cancer cell lines. The co-expression of genes in the same pathways was also significantly perturbed in tumors.

Conclusions

The co-regulation of genes in the same pathways and co-regulation of different pathways are significantly perturbed in tumors versus normal tissues. Our method provides a useful tool for better understanding the mechanistic changes in tumors, which can also be used for exploring other biological problems. 相似文献

8.

Comparison of gene expression microarray data with count-based RNA measurements informs microarray interpretation

Arianne C Richard Paul A Lyons James E Peters Daniele Biasci Shaun M Flint James C Lee Eoin F McKinney Richard M Siegel Kenneth GC Smith 《BMC genomics》2014,15(1)

Background

Although numerous investigations have compared gene expression microarray platforms, preprocessing methods and batch correction algorithms using constructed spike-in or dilution datasets, there remains a paucity of studies examining the properties of microarray data using diverse biological samples. Most microarray experiments seek to identify subtle differences between samples with variable background noise, a scenario poorly represented by constructed datasets. Thus, microarray users lack important information regarding the complexities introduced in real-world experimental settings. The recent development of a multiplexed, digital technology for nucleic acid measurement enables counting of individual RNA molecules without amplification and, for the first time, permits such a study.

Results

Using a set of human leukocyte subset RNA samples, we compared previously acquired microarray expression values with RNA molecule counts determined by the nCounter Analysis System (NanoString Technologies) in selected genes. We found that gene measurements across samples correlated well between the two platforms, particularly for high-variance genes, while genes deemed unexpressed by the nCounter generally had both low expression and low variance on the microarray. Confirming previous findings from spike-in and dilution datasets, this “gold-standard” comparison demonstrated signal compression that varied dramatically by expression level and, to a lesser extent, by dataset. Most importantly, examination of three different cell types revealed that noise levels differed across tissues.

Conclusions

Microarray measurements generally correlate with relative RNA molecule counts within optimal ranges but suffer from expression-dependent accuracy bias and precision that varies across datasets. We urge microarray users to consider expression-level effects in signal interpretation and to evaluate noise properties in each dataset independently.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-649) contains supplementary material, which is available to authorized users. 相似文献

9.

Differential prioritization between relevance and redundancy in correlation-based feature selection techniques for multiclass gene expression data

Chia Huey Ooi Madhu Chetty Shyh Wei Teng 《BMC bioinformatics》2006,7(1):320

Background

Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy. 相似文献

10.

Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data

Malik Yousef Segun Jung Louise C Showe Michael K Showe 《BMC bioinformatics》2007,8(1):144

Background

Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE) rather than recursive feature elimination (RFE). We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE. 相似文献

11.

A Platform for Processing Expression of Short Time Series (PESTS)

Anshu Sinha Marianthi Markatou 《BMC bioinformatics》2011,12(1):13

Background

Time course microarray profiles examine the expression of genes over a time domain. They are necessary in order to determine the complete set of genes that are dynamically expressed under given conditions, and to determine the interaction between these genes. Because of cost and resource issues, most time series datasets contain less than 9 points and there are few tools available geared towards the analysis of this type of data. 相似文献

12.

Statistical Test of Expression Pattern (STEPath): a new strategy to integrate gene expression data with genomic information in individual and meta-analysis studies

Paolo Martini Davide Risso Gabriele Sales Chiara Romualdi Gerolamo Lanfranchi Stefano Cagnin 《BMC bioinformatics》2011,12(1):92

相似文献

13.

A permutation-based multiple testing method for time-course microarray experiments

Insuk Sohn Kouros Owzar Stephen L George Sujong Kim Sin-Ho Jung 《BMC bioinformatics》2009,10(1):336

Background

Time-course microarray experiments are widely used to study the temporal profiles of gene expression. Storey et al. (2005) developed a method for analyzing time-course microarray studies that can be applied to discovering genes whose expression trajectories change over time within a single biological group, or those that follow different time trajectories among multiple groups. They estimated the expression trajectories of each gene using natural cubic splines under the null (no time-course) and alternative (time-course) hypotheses, and used a goodness of fit test statistic to quantify the discrepancy. The null distribution of the statistic was approximated through a bootstrap method. Gene expression levels in microarray data are often complicatedly correlated. An accurate type I error control adjusting for multiple testing requires the joint null distribution of test statistics for a large number of genes. For this purpose, permutation methods have been widely used because of computational ease and their intuitive interpretation. 相似文献

14.

FiGS: a filter-based gene selection workbench for microarray data

Taeho Hwang Choong-Hyun Sun Taegyun Yun Gwan-Su Yi 《BMC bioinformatics》2010,11(1):50

Background

The selection of genes that discriminate disease classes from microarray data is widely used for the identification of diagnostic biomarkers. Although various gene selection methods are currently available and some of them have shown excellent performance, no single method can retain the best performance for all types of microarray datasets. It is desirable to use a comparative approach to find the best gene selection result after rigorous test of different methodological strategies for a given microarray dataset. 相似文献

15.

Gene Features Selection for Three-Class Disease Classification via Multiple Orthogonal Partial Least Square Discriminant Analysis and S-Plot Using Microarray Data

Mingxing Yang Xiumin Li Zhibin Li Zhimin Ou Ming Liu Suhuan Liu Xuejun Li Shuyu Yang 《PloS one》2013,8(12)

Motivation

DNA microarray analysis is characterized by obtaining a large number of gene variables from a small number of observations. Cluster analysis is widely used to analyze DNA microarray data to make classification and diagnosis of disease. Because there are so many irrelevant and insignificant genes in a dataset, a feature selection approach must be employed in data analysis. The performance of cluster analysis of this high-throughput data depends on whether the feature selection approach chooses the most relevant genes associated with disease classes.

Results

Here we proposed a new method using multiple Orthogonal Partial Least Squares-Discriminant Analysis (mOPLS-DA) models and S-plots to select the most relevant genes to conduct three-class disease classification and prediction. We tested our method using Golub’s leukemia microarray data. For three classes with subtypes, we proposed hierarchical orthogonal partial least squares-discriminant analysis (OPLS-DA) models and S-plots to select features for two main classes and their subtypes. For three classes in parallel, we employed three OPLS-DA models and S-plots to choose marker genes for each class. The power of feature selection to classify and predict three-class disease was evaluated using cluster analysis. Further, the general performance of our method was tested using four public datasets and compared with those of four other feature selection methods. The results revealed that our method effectively selected the most relevant features for disease classification and prediction, and its performance was better than that of the other methods. 相似文献

16.

An Integrated Approach for Identifying Wrongly Labelled Samples When Performing Classification in Microarray Data

Yuk Yee Leung Chun Qi Chang Yeung Sam Hung 《PloS one》2012,7(10)

Background

Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own.

Results

We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples'' labels. Almost all the ‘wrong’ (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets. 相似文献

17.

Optimized between-group classification: a new jackknife-based gene selection procedure for genome-wide expression data

Florent?Baty Email author Michel?P?Bihl Guy?Perrière Aedín?C?Culhane Martin?H?Brutsche 《BMC bioinformatics》2005,6(1):239

Background

A recent publication described a supervised classification method for microarray data: Between Group Analysis (BGA). This method which is based on performing multivariate ordination of groups proved to be very efficient for both classification of samples into pre-defined groups and disease class prediction of new unknown samples. Classification and prediction with BGA are classically performed using the whole set of genes and no variable selection is required. We hypothesize that an optimized selection of highly discriminating genes might improve the prediction power of BGA. 相似文献

18.

Meta-analysis of breast cancer microarray studies in conjunction with conserved <Emphasis Type="Italic">cis</Emphasis>-elements suggest patterns for coordinate regulation

David D Smith Pål Sætrom Ola SnøveJr Cathryn Lundberg Guillermo E Rivas Carlotta Glackin Garrett P Larson 《BMC bioinformatics》2008,9(1):63

Background

Gene expression measurements from breast cancer (BrCa) tumors are established clinical predictive tools to identify tumor subtypes, identify patients showing poor/good prognosis, and identify patients likely to have disease recurrence. However, diverse breast cancer datasets in conjunction with diagnostic clinical arrays show little overlap in the sets of genes identified. One approach to identify a set of consistently dysregulated candidate genes in these tumors is to employ meta-analysis of multiple independent microarray datasets. This allows one to compare expression data from a diverse collection of breast tumor array datasets generated on either cDNA or oligonucleotide arrays. 相似文献

19.

Predicting survival outcomes using subsets of significant genes in prognostic marker studies with microarrays

Shigeyuki Matsui 《BMC bioinformatics》2006,7(1):156-9

Background

Genetic markers hold great promise for refining our ability to establish precise prognostic prediction for diseases. The development of comprehensive gene expression microarray technology has allowed the selection of relevant marker genes from a large pool of candidate genes in early-phased, developmental prognostic marker studies. The primary analytical task in such studies is to select a small fraction of relevant genes, typically from a list of significant genes, for further investigation in subsequent studies. 相似文献

20.

The identification of informative genes from multiple datasets with increasing complexity

S Yahya Anvar Peter AC 't Hoen Allan Tucker 《BMC bioinformatics》2010,11(1):32

Background

In microarray data analysis, factors such as data quality, biological variation, and the increasingly multi-layered nature of more complex biological systems complicates the modelling of regulatory networks that can represent and capture the interactions among genes. We believe that the use of multiple datasets derived from related biological systems leads to more robust models. Therefore, we developed a novel framework for modelling regulatory networks that involves training and evaluation on independent datasets. Our approach includes the following steps: (1) ordering the datasets based on their level of noise and informativeness; (2) selection of a Bayesian classifier with an appropriate level of complexity by evaluation of predictive performance on independent data sets; (3) comparing the different gene selections and the influence of increasing the model complexity; (4) functional analysis of the informative genes. 相似文献