首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature’s relevance to a classification task.

Results

We apply POS, along‐with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.

Conclusions

A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along‐with a novel gene score are exploited to produce the selected subset of genes.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-274) contains supplementary material, which is available to authorized users.  相似文献   

2.

Background

Despite the recent identification of several prognostic gene signatures, the lack of common genes among experimental cohorts has posed a considerable challenge in uncovering the molecular basis underlying hepatocellular carcinoma (HCC) recurrence for application in clinical purposes. To overcome the limitations of individual gene-based analysis, we applied a pathway-based approach for analysis of HCC recurrence.

Results

By implementing a permutation-based semi-supervised principal component analysis algorithm using the optimal principal component, we selected sixty-four pathways associated with hepatitis B virus (HBV)-positive HCC recurrence (p < 0.01), from our microarray dataset composed of 142 HBV-positive HCCs. In relation to the public HBV- and public hepatitis C virus (HCV)-positive HCC datasets, we detected 46 (71.9%) and 18 (28.1%) common recurrence-associated pathways, respectively. However, overlap of recurrence-associated genes between datasets was rare, further supporting the utility of the pathway-based approach for recurrence analysis between different HCC datasets. Non-supervised clustering of the 64 recurrence-associated pathways facilitated the classification of HCC patients into high- and low-risk subgroups, based on risk of recurrence (p < 0.0001). The pathways identified were additionally successfully applied to discriminate subgroups depending on recurrence risk within the public HCC datasets. Through multivariate analysis, these recurrence-associated pathways were identified as an independent prognostic factor (p < 0.0001) along with tumor number, tumor size and Edmondson’s grade. Moreover, the pathway-based approach had a clinical advantage in terms of discriminating the high-risk subgroup (N = 12) among patients (N = 26) with small HCC (<3 cm).

Conclusions

Using pathway-based analysis, we successfully identified the pathways involved in recurrence of HBV-positive HCC that may be effectively used as prognostic markers.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1472-x) contains supplementary material, which is available to authorized users.  相似文献   

3.

Background

Sequencing datasets consist of a finite number of reads which map to specific regions of a reference genome. Most effort in modeling these datasets focuses on the detection of univariate differentially expressed genes. However, for classification, we must consider multiple genes and their interactions.

Results

Thus, we introduce a hierarchical multivariate Poisson model (MP) and the associated optimal Bayesian classifier (OBC) for classifying samples using sequencing data. Lacking closed-form solutions, we employ a Monte Carlo Markov Chain (MCMC) approach to perform classification. We demonstrate superior or equivalent classification performance compared to typical classifiers for two synthetic datasets and over a range of classification problem difficulties. We also introduce the Bayesian minimum mean squared error (MMSE) conditional error estimator and demonstrate its computation over the feature space. In addition, we demonstrate superior or leading class performance over an RNA-Seq dataset containing two lung cancer tumor types from The Cancer Genome Atlas (TCGA).

Conclusions

Through model-based, optimal Bayesian classification, we demonstrate superior classification performance for both synthetic and real RNA-Seq datasets. A tutorial video and Python source code is available under an open source license at http://bit.ly/1gimnss.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0401-3) contains supplementary material, which is available to authorized users.  相似文献   

4.

Background

Predication of gene regularity network (GRN) from expression data is a challenging task. There are many methods that have been developed to address this challenge ranging from supervised to unsupervised methods. Most promising methods are based on support vector machine (SVM). There is a need for comprehensive analysis on prediction accuracy of supervised method SVM using different kernels on different biological experimental conditions and network size.

Results

We developed a tool (CompareSVM) based on SVM to compare different kernel methods for inference of GRN. Using CompareSVM, we investigated and evaluated different SVM kernel methods on simulated datasets of microarray of different sizes in detail. The results obtained from CompareSVM showed that accuracy of inference method depends upon the nature of experimental condition and size of the network.

Conclusions

For network with nodes (<200) and average (over all sizes of networks), SVM Gaussian kernel outperform on knockout, knockdown, and multifactorial datasets compared to all the other inference methods. For network with large number of nodes (~500), choice of inference method depend upon nature of experimental condition. CompareSVM is available at http://bis.zju.edu.cn/CompareSVM/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0395-x) contains supplementary material, which is available to authorized users.  相似文献   

5.

Background

Although numerous investigations have compared gene expression microarray platforms, preprocessing methods and batch correction algorithms using constructed spike-in or dilution datasets, there remains a paucity of studies examining the properties of microarray data using diverse biological samples. Most microarray experiments seek to identify subtle differences between samples with variable background noise, a scenario poorly represented by constructed datasets. Thus, microarray users lack important information regarding the complexities introduced in real-world experimental settings. The recent development of a multiplexed, digital technology for nucleic acid measurement enables counting of individual RNA molecules without amplification and, for the first time, permits such a study.

Results

Using a set of human leukocyte subset RNA samples, we compared previously acquired microarray expression values with RNA molecule counts determined by the nCounter Analysis System (NanoString Technologies) in selected genes. We found that gene measurements across samples correlated well between the two platforms, particularly for high-variance genes, while genes deemed unexpressed by the nCounter generally had both low expression and low variance on the microarray. Confirming previous findings from spike-in and dilution datasets, this “gold-standard” comparison demonstrated signal compression that varied dramatically by expression level and, to a lesser extent, by dataset. Most importantly, examination of three different cell types revealed that noise levels differed across tissues.

Conclusions

Microarray measurements generally correlate with relative RNA molecule counts within optimal ranges but suffer from expression-dependent accuracy bias and precision that varies across datasets. We urge microarray users to consider expression-level effects in signal interpretation and to evaluate noise properties in each dataset independently.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-649) contains supplementary material, which is available to authorized users.  相似文献   

6.
7.
8.
9.

Background

Aflatoxin is a potent carcinogen that can contaminate grain infected with the fungus Aspergillus flavus. However, resistance to aflatoxin accumulation in maize is a complex trait with low heritability. Here, two complementary analyses were performed to better understand the mechanisms involved. The first coupled results of a genome-wide association study (GWAS) that accounted for linkage disequilibrium among single nucleotide polymorphisms (SNPs) with gene-set enrichment for a pathway-based approach. The rationale was that the cumulative effects of genes in a pathway would give insight into genetic differences that distinguish resistant from susceptible lines of maize. The second involved finding non-pathway genes close to the most significant SNP-trait associations with the greatest effect on reducing aflatoxin in multiple environments. Unlike conventional GWAS, the latter analysis emphasized multiple aspects of SNP-trait associations rather than just significance and was performed because of the high genotype x environment variability exhibited by this trait.

Results

The most significant metabolic pathway identified was jasmonic acid (JA) biosynthesis. Specifically, there was at least one allelic variant for each step in the JA biosynthesis pathway that conferred an incremental decrease to the level of aflatoxin observed among the inbred lines in the GWAS panel. Several non-pathway genes were also consistently associated with lowered aflatoxin levels. Those with predicted functions related to defense were: leucine-rich repeat protein kinase, expansin B3, reversion-to-ethylene sensitivity1, adaptor protein complex2, and a multidrug and toxic compound extrusion protein.

Conclusions

Our genetic analysis provided strong evidence for several genes that were associated with aflatoxin resistance. Inbred lines that exhibited lower levels of aflatoxin accumulation tended to share similar haplotypes for genes specifically in the pathway of JA biosynthesis, along with several non-pathway genes with putative defense-related functions. Knowledge gained from these two complementary analyses has improved our understanding of population differences in aflatoxin resistance.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1874-9) contains supplementary material, which is available to authorized users.  相似文献   

10.
11.
12.
13.
14.

Introduction

Systemic lupus erythematosus (SLE) is an autoimmune disease associated with a break in self-tolerance reflected by a production of antinuclear autoantibodies. Since autoantibody production can be activated via nucleic acid Toll-like receptor 9 (TLR9), the respective pathway has been implicated in the development of SLE and pathogenic B cell responses. However, the response of B cells from SLE patients to TLR9 stimulation remains incompletely characterized.

Methods

In the current study, the response of B cells from SLE patients and healthy donors upon TLR9 stimulation was analyzed in terms of proliferation and cytokine production and correlated with the lupus disease activity and anti-dsDNA titers.

Results

B cells from SLE patients showed a reduced response to TLR9 agonist compared to B cells from healthy donors in terms of proliferation and activation. B cells from SLE patients with higher disease activity produced less interleukin (IL)-6, IL-10, vascular endothelial growth factor, and IL-1ra than B cells from healthy donors. Further analyses revealed an inverse correlation of cytokines produced by TLR9-stimulated B cells with lupus disease activity and anti-dsDNA titer, respectively.

Conclusion

The capacity of B cells from lupus patients to produce cytokines upon TLR9 engagement becomes less efficient with increasing disease activity, suggesting that they either enter an exhausted state or become tolerant to TLR stimulation for cytokine production when disease worsens.

Electronic supplementary material

The online version of this article (doi:10.1186/s13075-014-0477-1) contains supplementary material, which is available to authorized users.  相似文献   

15.

Background

Thyroid cancer (TC) is the most common malignant cancer of the Endocrine System. Histologically, there are three main subtypes of TC: follicular, papillary and anaplastic. Diagnosing a thyroid tumor subtype with a high level of accuracy and confidence is still a difficult task because genetic, molecular and cellular mechanisms underlying the transition from differentiated to undifferentiated thyroid tumors are not well understood.A genome-wide analysis of these three subtypes of thyroid carcinoma was carried out in order to identify significant differences in expression levels as well as enriched pathways for non-shared molecular and cellular features between subtypes.

Results

Inhibition of matrix metalloproteinases pathway is a major event involved in thyroid cancer progression and its dysregulation may result crucial for invasiveness, migration and metastasis. This pathway is drastically altered in ATC while in FTC and PTC, the most important pathways are related to DNA-repair activation or cell to cell signaling events.

Conclusion

A progression from FTC to PTC and then to ATC was detected and validated on two independent datasets. Moreover, PTX3, COLEC12 and PDGFRA genes were found as possible candidates for biomarkers of ATC while GPR110 could be tested to distinguish PTC over other tumor subtypes. The genome-wide analysis emphasizes the preponderance of pathway-dysregulation mechanisms over simple gene-malfunction as the main mechanism involved in the development of a cancer phenotype.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1372-0) contains supplementary material, which is available to authorized users.  相似文献   

16.
17.
18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号