期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Large Scale Comparison of Gene Expression Levels by Microarrays and RNAseq Using TCGA Data

Yan Guo Quanhu Sheng Jiang Li Fei Ye David C. Samuels Yu Shyr 《PloS one》2013,8(8)

RNAseq and microarray methods are frequently used to measure gene expression level. While similar in purpose, there are fundamental differences between the two technologies. Here, we present the largest comparative study between microarray and RNAseq methods to date using The Cancer Genome Atlas (TCGA) data. We found high correlations between expression data obtained from the Affymetrix one-channel microarray and RNAseq (Spearman correlations coefficients of ∼0.8). We also observed that the low abundance genes had poorer correlations between microarray and RNAseq data than high abundance genes. As expected, due to measurement and normalization differences, Agilent two-channel microarray and RNAseq data were poorly correlated (Spearman correlations coefficients of only ∼0.2). By examining the differentially expressed genes between tumor and normal samples we observed reasonable concordance in directionality between Agilent two-channel microarray and RNAseq data, although a small group of genes were found to have expression changes reported in opposite directions using these two technologies. Overall, RNAseq produces comparable results to microarray technologies in term of expression profiling. The RNAseq normalization methods RPKM and RSEM produce similar results on the gene level and reasonably concordant results on the exon level. Longer exons tended to have better concordance between the two normalization methods than shorter exons. 相似文献

2.

RTCGAToolbox: A New Tool for Exporting TCGA Firehose Data

Mehmet Kemal Samur 《PloS one》2014,9(9)

Background & Objective

Managing data from large-scale projects (such as The Cancer Genome Atlas (TCGA)) for further analysis is an important and time consuming step for research projects. Several efforts, such as the Firehose project, make TCGA pre-processed data publicly available via web services and data portals, but this information must be managed, downloaded and prepared for subsequent steps. We have developed an open source and extensible R based data client for pre-processed data from the Firehouse, and demonstrate its use with sample case studies. Results show that our RTCGAToolbox can facilitate data management for researchers interested in working with TCGA data. The RTCGAToolbox can also be integrated with other analysis pipelines for further data processing.

Availability and implementation

The RTCGAToolbox is open-source and licensed under the GNU General Public License Version 2.0. All documentation and source code for RTCGAToolbox is freely available at http://mksamur.github.io/RTCGAToolbox/ for Linux and Mac OS X operating systems. 相似文献

3.

Predicting Rotator Cuff Tears Using Data Mining and Bayesian Likelihood Ratios

Hsueh-Yi Lu Chen-Yuan Huang Chwen-Tzeng Su Chen-Chiang Lin 《PloS one》2014,9(4)

Objectives

Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone.

Methods

In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models.

Results

Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan''s nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear).

Conclusions

Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears. 相似文献

4.

Zombies in TCGA

Daniel DiMaio 《Journal of virology》2015,89(8):4044-4046

Next-generation sequencing results obtained to detect somatic mutations in human cancers can also be searched for viruses that contribute to cancer. Recently, human papillomavirus 18 RNA was detected in tumor types not typically associated with HPV infection. Analyses reported in this issue of Journal of Virology demonstrate that the apparent presence of HPV18 RNA in these atypical tumors is due in at least some cases to contamination of samples with HeLa cells, which harbor HPV18. 相似文献

5.

Inference of Boolean Networks Using Sensitivity Regularization

Wenbin Liu Harri L?¤hdesm?¤ki Edward R Dougherty Ilya Shmulevich 《EURASIP Journal on Bioinformatics and Systems Biology》2008,2008(1):780541

The inference of genetic regulatory networks from global measurements of gene expressions is an important problem in computational biology. Recent studies suggest that such dynamical molecular systems are poised at a critical phase transition between an ordered and a disordered phase, affording the ability to balance stability and adaptability while coordinating complex macroscopic behavior. We investigate whether incorporating this dynamical system-wide property as an assumption in the inference process is beneficial in terms of reducing the inference error of the designed network. Using Boolean networks, for which there are well-defined notions of ordered, critical, and chaotic dynamical regimes as well as well-studied inference procedures, we analyze the expected inference error relative to deviations in the networks'' dynamical regimes from the assumption of criticality. We demonstrate that taking criticality into account via a penalty term in the inference procedure improves the accuracy of prediction both in terms of state transitions and network wiring, particularly for small sample sizes. 相似文献

6.

Inference of Boolean Networks Using Sensitivity Regularization

Wenbin Liu Harri LÃ¤hdesmÃ¤ki Edward R Dougherty Ilya Shmulevich 《EURASIP Journal on Bioinformatics and Systems Biology》2007,2008(1):1-12

相似文献

7.

Reconfigurable Boolean Logic Using Magnetic Single-Electron Transistors

M. Fernando Gonzalez-Zalba Chiara Ciccarelli Liviu P. Zarbo Andrew C. Irvine Richard C. Campion Bryan L. Gallagher Tomas Jungwirth Andrew J. Ferguson Joerg Wunderlich 《PloS one》2015,10(4)

We propose a novel hybrid single-electron device for reprogrammable low-power logic operations, the magnetic single-electron transistor (MSET). The device consists of an aluminium single-electron transistor with a GaMnAs magnetic back-gate. Changing between different logic gate functions is realized by reorienting the magnetic moments of the magnetic layer, which induces a voltage shift on the Coulomb blockade oscillations of the MSET. We show that we can arbitrarily reprogram the function of the device from an n-type SET for in-plane magnetization of the GaMnAs layer to p-type SET for out-of-plane magnetization orientation. Moreover, we demonstrate a set of reprogrammable Boolean gates and its logical complement at the single device level. Finally, we propose two sets of reconfigurable binary gates using combinations of two MSETs in a pull-down network. 相似文献

8.

Research Conducted Using Data Obtained through Online Communities: Ethical Implications of Methodological Limitations

A. Cecile J. W. Janssens Peter Kraft 《PLoS medicine》2012,9(10)

相似文献

9.

数据噪音构建基因布尔网络模型的方法

王丽琴李建更李岩《生物信息学》2009,7(1):40-43

在分析基因数据时,往往有噪音出现,因此借用基因表达谱数据中的噪音来建立卡诺图,可以得到布尔网络逻辑函数。而且利用此方法确定蛋白质与蛋白质之间的逻辑关系,建立蛋白质的逻辑网络。通过该方法可以寻找直系同源簇蛋白质数据的逻辑关系。相似文献

10.

Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

Ujjwal Maulik Saurav Mallik Anirban Mukhopadhyay Sanghamitra Bandyopadhyay 《PloS one》2015,10(4)

Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. 相似文献

11.

Objective Definition of Rosette Shape Variation Using a Combined Computer Vision and Data Mining Approach

Anyela Camargo Dimitra Papadopoulou Zoi Spyropoulou Konstantinos Vlachonasios John H. Doonan Alan P. Gay 《PloS one》2014,9(5)

相似文献

12.

Discovery Analysis of TCGA Data Reveals Association between Germline Genotype and Survival in Ovarian Cancer Patients

Rosemary Braun Richard Finney Chunhua Yan Qing-Rong Chen Ying Hu Michael Edmonson Daoud Meerzaman Kenneth Buetow 《PloS one》2013,8(3)

Background

Ovarian cancer remains a significant public health burden, with the highest mortality rate of all the gynecological cancers. This is attributable to the late stage at which the majority of ovarian cancers are diagnosed, coupled with the low and variable response of advanced tumors to standard chemotherapies. To date, clinically useful predictors of treatment response remain lacking. Identifying the genetic determinants of ovarian cancer survival and treatment response is crucial to the development of prognostic biomarkers and personalized therapies that may improve outcomes for the late-stage patients who comprise the majority of cases.

Methods

To identify constitutional genetic variations contributing to ovarian cancer mortality, we systematically investigated associations between germline polymorphisms and ovarian cancer survival using data from The Cancer Genome Atlas Project (TCGA). Using stage-stratified Cox proportional hazards regression, we examined 650,000 SNP loci for association with survival. We additionally examined whether the association of significant SNPs with survival was modified by somatic alterations.

Results

Germline polymorphisms at rs4934282 (AGAP11/C10orf116) and rs1857623 (DNAH14) were associated with stage-adjusted survival ( = 1.12e-07 and 1.80e-07, FDR = 1.2e-04 and 2.4e-04, respectively). A third SNP, rs4869 (C10orf116), was additionally identified as significant in the exome sequencing data; it is in near-perfect LD with rs4934282. The associations with survival remained significant when somatic alterations.

Conclusions

Discovery analysis of TCGA data reveals germline genetic variations that may play a role in ovarian cancer survival even among late-stage cases. The significant loci are located near genes previously reported as having a possible relationship to platinum and taxol response. Because the variant alleles at the significant loci are common (frequencies for rs4934282 A/C alleles = 0.54/0.46, respectively; rs1857623 A/G alleles = 0.55/0.45, respectively) and germline variants can be assayed noninvasively, our findings provide potential targets for further exploration as prognostic biomarkers and individualized therapies. 相似文献

13.

Modeling Information Quality Risk for Data Mining in Data Warehouses

Ying Su Jie Peng Zhanming Jin 《人类与生态风险评估》2009,15(2):332-350

Information Quality (IQ) is a critical factor for the success of many activities in the information age, including the development of data warehouses and implementation of data mining. The issue of IQ risk is recognized during the process of data mining; however, there is no formal methodological approach to dealing with such issues.

Consequently, it is essential to measure the risk of IQ in a data warehouse to ensure success in implementing data mining. This article presents a methodology to determine three IQ risk characteristics: accuracy, comprehensiveness, and non-membership. The methodology provides a set of quantitative models to examine how the quality risks of source information affect the quality for information outputs produced using the relational algebra operations: Restriction, Projection, and Cubic product. It can be used to determine how quality risks associated with diverse data sources affect the derived data. The study also develops a data cube model and associated algebra to support IQ risk operations. 相似文献

14.

From Data towards Knowledge: Revealing the Architecture of Signaling Systems by Unifying Knowledge Mining and Data Mining of Systematic Perturbation Data

Songjian Lu Bo Jin L. Ashley Cowart Xinghua Lu 《PloS one》2013,8(4)

Genetic and pharmacological perturbation experiments, such as deleting a gene and monitoring gene expression responses, are powerful tools for studying cellular signal transduction pathways. However, it remains a challenge to automatically derive knowledge of a cellular signaling system at a conceptual level from systematic perturbation-response data. In this study, we explored a framework that unifies knowledge mining and data mining towards the goal. The framework consists of the following automated processes: 1) applying an ontology-driven knowledge mining approach to identify functional modules among the genes responding to a perturbation in order to reveal potential signals affected by the perturbation; 2) applying a graph-based data mining approach to search for perturbations that affect a common signal; and 3) revealing the architecture of a signaling system by organizing signaling units into a hierarchy based on their relationships. Applying this framework to a compendium of yeast perturbation-response data, we have successfully recovered many well-known signal transduction pathways; in addition, our analysis has led to many new hypotheses regarding the yeast signal transduction system; finally, our analysis automatically organized perturbed genes as a graph reflecting the architecture of the yeast signaling system. Importantly, this framework transformed molecular findings from a gene level to a conceptual level, which can be readily translated into computable knowledge in the form of rules regarding the yeast signaling system, such as “if genes involved in the MAPK signaling are perturbed, genes involved in pheromone responses will be differentially expressed.” 相似文献

15.

Variability in Regularity: Mining Temporal Mobility Patterns in London,Singapore and Beijing Using Smart-Card Data

Chen Zhong Michael Batty Ed Manley Jiaqiu Wang Zijia Wang Feng Chen Gerhard Schmitt 《PloS one》2016,11(2)

To discover regularities in human mobility is of fundamental importance to our understanding of urban dynamics, and essential to city and transport planning, urban management and policymaking. Previous research has revealed universal regularities at mainly aggregated spatio-temporal scales but when we zoom into finer scales, considerable heterogeneity and diversity is observed instead. The fundamental question we address in this paper is at what scales are the regularities we detect stable, explicable, and sustainable. This paper thus proposes a basic measure of variability to assess the stability of such regularities focusing mainly on changes over a range of temporal scales. We demonstrate this by comparing regularities in the urban mobility patterns in three world cities, namely London, Singapore and Beijing using one-week of smart-card data. The results show that variations in regularity scale as non-linear functions of the temporal resolution, which we measure over a scale from 1 minute to 24 hours thus reflecting the diurnal cycle of human mobility. A particularly dramatic increase in variability occurs up to the temporal scale of about 15 minutes in all three cities and this implies that limits exist when we look forward or backward with respect to making short-term predictions. The degree of regularity varies in fact from city to city with Beijing and Singapore showing higher regularity in comparison to London across all temporal scales. A detailed discussion is provided, which relates the analysis to various characteristics of the three cities. In summary, this work contributes to a deeper understanding of regularities in patterns of transit use from variations in volumes of travellers entering subway stations, it establishes a generic analytical framework for comparative studies using urban mobility data, and it provides key points for the management of variability by policy-makers intent on for making the travel experience more amenable. 相似文献

16.

采用数据挖掘技术对湖北省人类狂犬病开展生物信息学研究

张巧珍吴雯婷李紫萱赵心博胡兵刘聪隋正伟刘宏图章乐《中国生物工程杂志》2021,(2):14-29

狂犬病作为一种急性人畜共患疾病,其致死率接近100％.而湖北省作为我国中部狂犬病高发地之一,开展该省狂犬病的流行病学调查不仅有助于了解我国当前面临的狂犬病疫情风险,还能为当地及全国狂犬病防控工作提供有效的参考意见.首先运用描述性分析发现湖北省狂犬病发病人数随时间呈下降趋势,而狂犬病暴露人数则呈上升趋势,且近年来中西部地... 相似文献

17.

mtDNA Data Mining in GenBank Needs Surveying

Yong-Gang Yao Ian Logan 《American journal of human genetics》2009,85(6):929-933

相似文献

18.

Data Mining in Bioinformatics (BIOKDD)

Mohammed J Zaki George Karypis Jiong Yang 《Algorithms for molecular biology : AMB》2007,2(1):1-2

相似文献

19.

Data Mining in Bioinformatics (BIOKDD)

Mohammed J Zaki George Karypis Jiong Yang 《Algorithms for molecular biology : AMB》2007,2(1):4

相似文献

20.

Complementing ODE-Based System Analysis Using Boolean Networks Derived from an Euler-Like Transformation

Claudia St?tzel Susanna R?blitz Heike Siebert 《PloS one》2015,10(10)

In this paper, we present a systematic transition scheme for a large class of ordinary differential equations (ODEs) into Boolean networks. Our transition scheme can be applied to any system of ODEs whose right hand sides can be written as sums and products of monotone functions. It performs an Euler-like step which uses the signs of the right hand sides to obtain the Boolean update functions for every variable of the corresponding discrete model. The discrete model can, on one hand, be considered as another representation of the biological system or, alternatively, it can be used to further the analysis of the original ODE model. Since the generic transformation method does not guarantee any property conservation, a subsequent validation step is required. Depending on the purpose of the model this step can be based on experimental data or ODE simulations and characteristics. Analysis of the resulting Boolean model, both on its own and in comparison with the ODE model, then allows to investigate system properties not accessible in a purely continuous setting. The method is exemplarily applied to a previously published model of the bovine estrous cycle, which leads to new insights regarding the regulation among the components, and also indicates strongly that the system is tailored to generate stable oscillations. 相似文献