期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Predicting adverse drug reactions through interpretable deep learning framework

Sanjoy Dey Heng Luo Achille Fokoue Jianying Hu Ping Zhang 《BMC bioinformatics》2018,19(21):476

Background

Adverse drug reactions (ADRs) are unintended and harmful reactions caused by normal uses of drugs. Predicting and preventing ADRs in the early stage of the drug development pipeline can help to enhance drug safety and reduce financial costs.

Methods

In this paper, we developed machine learning models including a deep learning framework which can simultaneously predict ADRs and identify the molecular substructures associated with those ADRs without defining the substructures a-priori.

Results

We evaluated the performance of our model with ten different state-of-the-art fingerprint models and found that neural fingerprints from the deep learning model outperformed all other methods in predicting ADRs. Via feature analysis on drug structures, we identified important molecular substructures that are associated with specific ADRs and assessed their associations via statistical analysis.

Conclusions

The deep learning model with feature analysis, substructure identification, and statistical assessment provides a promising solution for identifying risky components within molecular structures and can potentially help to improve drug safety evaluation.

相似文献

2.

Discovering functional impacts of miRNAs in cancers using a causal deep learning model

Lujia Chen Xinghua Lu 《BMC medical genomics》2018,11(6):116

相似文献

3.

GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization

Hung-I Harry Chen Yu-Chiao Chiu Tinghe Zhang Songyao Zhang Yufei Huang Yidong Chen 《BMC systems biology》2018,12(8):142

Background

Bioinformatics tools have been developed to interpret gene expression data at the gene set level, and these gene set based analyses improve the biologists’ capability to discover functional relevance of their experiment design. While elucidating gene set individually, inter-gene sets association is rarely taken into consideration. Deep learning, an emerging machine learning technique in computational biology, can be used to generate an unbiased combination of gene set, and to determine the biological relevance and analysis consistency of these combining gene sets by leveraging large genomic data sets.

Results

In this study, we proposed a gene superset autoencoder (GSAE), a multi-layer autoencoder model with the incorporation of a priori defined gene sets that retain the crucial biological features in the latent layer. We introduced the concept of the gene superset, an unbiased combination of gene sets with weights trained by the autoencoder, where each node in the latent layer is a superset. Trained with genomic data from TCGA and evaluated with their accompanying clinical parameters, we showed gene supersets’ ability of discriminating tumor subtypes and their prognostic capability. We further demonstrated the biological relevance of the top component gene sets in the significant supersets.

Conclusions

Using autoencoder model and gene superset at its latent layer, we demonstrated that gene supersets retain sufficient biological information with respect to tumor subtypes and clinical prognostic significance. Superset also provides high reproducibility on survival analysis and accurate prediction for cancer subtypes.

相似文献

4.

Predicting enhancers with deep convolutional neural networks

Min Xu Zeng Wanwen Chen Shengquan Chen Ning Chen Ting Jiang Rui 《BMC bioinformatics》2017,18(13):478-46

Background

With the rapid development of deep sequencing techniques in the recent years, enhancers have been systematically identified in such projects as FANTOM and ENCODE, forming genome-wide landscapes in a series of human cell lines. Nevertheless, experimental approaches are still costly and time consuming for large scale identification of enhancers across a variety of tissues under different disease status, making computational identification of enhancers indispensable.

Results

To facilitate the identification of enhancers, we propose a computational framework, named DeepEnhancer, to distinguish enhancers from background genomic sequences. Our method purely relies on DNA sequences to predict enhancers in an end-to-end manner by using a deep convolutional neural network (CNN). We train our deep learning model on permissive enhancers and then adopt a transfer learning strategy to fine-tune the model on enhancers specific to a cell line. Results demonstrate the effectiveness and efficiency of our method in the classification of enhancers against random sequences, exhibiting advantages of deep learning over traditional sequence-based classifiers. We then construct a variety of neural networks with different architectures and show the usefulness of such techniques as max-pooling and batch normalization in our method. To gain the interpretability of our approach, we further visualize convolutional kernels as sequence logos and successfully identify similar motifs in the JASPAR database.

Conclusions

DeepEnhancer enables the identification of novel enhancers using only DNA sequences via a highly accurate deep learning model. The proposed computational framework can also be applied to similar problems, thereby prompting the use of machine learning methods in life sciences.

相似文献

5.

Statistical analysis of fractionation resistance by functional category and expression

Chen Eric C. H. Morin Annie Chauchat Jean-Hugues Sankoff David 《BMC genomics》2017,18(4):366-9

Background

The current literature establishes the importance of gene functional category and expression in promoting or suppressing duplicate gene loss after whole genome doubling in plants, a process known as fractionation. Inspired by studies that have reported gene expression to be the dominating factor in preventing duplicate gene loss, we analyzed the relative effect of functional category and expression.

Methods

We use multivariate methods to study data sets on gene retention, function and expression in rosids and asterids to estimate effects and assess their interaction.

Results

Our results suggest that the effect on duplicate gene retention fractionation by functional category and expression are independent and have no statistical interaction.

Conclusion

In plants, functional category is the more dominant factor in explaining duplicate gene loss.

相似文献

6.

Bi-stream CNN Down Syndrome screening model based on genotyping array

Bing Feng William Hoskins Yan Zhang Zibo Meng David C. Samuels Jiandong Wang Ruofan Xia Chao Liu Jijun Tang Yan Guo 《BMC medical genomics》2018,11(5):105

Background

Human Down syndrome (DS) is usually caused by genomic micro-duplications and dosage imbalances of human chromosome 21. It is associated with many genomic and phenotype abnormalities. Even though human DS occurs about 1 per 1,000 births worldwide, which is a very high rate, researchers haven’t found any effective method to cure DS. Currently, the most efficient ways of human DS prevention are screening and early detection.

Methods

In this study, we used deep learning techniques and analyzed a set of Illumina genotyping array data. We built a bi-stream convolutional neural networks model to screen/predict the occurrence of DS. Firstly, we built image input data by converting the intensities of each SNP site into chromosome SNP maps. Next, we proposed a bi-stream convolutional neural network (CNN) architecture with nine layers and two branch models. We further merged two CNN branch models into one model in the fourth convolutional layer, and output the prediction in the last layer.

Results

Our bi-stream CNN model achieved 99.3% average accuracies, and very low false-positive and false-negative rates, which was necessary for further applications in disease prediction and medical practice. We further visualized the feature maps and learned filters from intermediate convolutional layers, which showed the genomic patterns and correlated SNPs variations in human DS genomes. We also compared our methods with other CNN and traditional machine learning models. We further analyzed and discussed the characteristics and strengths of our bi-stream CNN model.

Conclusions

Our bi-stream model used two branch CNN models to learn the local genome features and regional patterns among adjacent genes and SNP sites from two chromosomes simultaneously. It achieved the best performance in all evaluating metrics when compared with two single-stream CNN models and three traditional machine-learning algorithms. The visualized feature maps also provided opportunities to study the genomic markers and pathway components associated with Human DS, which provided insights for gene therapy and genomic medicine developments.

相似文献

7.

RNAe in a transgenic growth hormone mouse model shows potential for use in gene therapy

Haizhou Long Yi Yao Shouhong Jin Yingting Yu Xiongbing Hu Fengfeng Zhuang Hanshuo Zhang Qiong Wu 《Biotechnology letters》2017,39(2):179-188

相似文献

8.

Deep learning for DNase I hypersensitive sites identification

Lyu Chuqiao Wang Lei Zhang Juhua 《BMC genomics》2018,19(10):905-165

Background

The DNase I hypersensitive sites (DHSs) are associated with the cis-regulatory DNA elements. An efficient method of identifying DHSs can enhance the understanding on the accessibility of chromatin. Despite a multitude of resources available on line including experimental datasets and computational tools, the complex language of DHSs remains incompletely understood.

Methods

Here, we address this challenge using an approach based on a state-of-the-art machine learning method. We present a novel convolutional neural network (CNN) which combined Inception like networks with a gating mechanism for the response of multiple patterns and longterm association in DNA sequences to predict multi-scale DHSs in Arabidopsis, rice and Homo sapiens.

Results

Our method obtains 0.961 area under curve (AUC) on Arabidopsis, 0.969 AUC on rice and 0.918 AUC on Homo sapiens.

Conclusions

Our method provides an efficient and accurate way to identify multi-scale DHSs sequences by deep learning.

相似文献

9.

Metastatic tumor cells – genotypes and phenotypes

Dingcheng Gao Vivek Mittal Yi Ban Ana Rita Lourenco Shira Yomtoubian Sharrell Lee 《生物学前沿》2018,13(4):277-286

Background

Metastasis is the primary cause of mortality in cancer patients. Therefore, elucidating the genetics and epigenetics of metastatic tumor cells and the mechanisms by which tumor cells acquire metastatic properties constitute significant challenges in cancer research.

Objective

To summarize the current understandings of the specific genotype and phenotype of the metastatic tumor cells.

Method and Result

In-depth genetic analysis of tumor cells, especially with advances in the next-generation sequencing, have revealed insights of the genotypes of metastatic tumor cells. Also, studies have shown that the cancer stem cell (CSC) and epithelial to mesenchymal transition (EMT) phenotypes are associated with the metastatic cascade.

Conclusion

In this review, we will discuss recent advances in the field by focusing on the genomic instability and phenotypic dynamics of metastatic tumor cells.

相似文献

10.

Prediction of protein self-interactions using stacked long short-term memory from protein sequences information

Yan-Bin Wang Zhu-Hong You Xiao Li Tong-Hai Jiang Li Cheng Zhan-Heng Chen 《BMC systems biology》2018,12(8):129

Background

Self-interacting Proteins (SIPs) plays a critical role in a series of life function in most living cells. Researches on SIPs are important part of molecular biology. Although numerous SIPs data be provided, traditional experimental methods are labor-intensive, time-consuming and costly and can only yield limited results in real-world needs. Hence,it’s urgent to develop an efficient computational SIPs prediction method to fill the gap. Deep learning technologies have proven to produce subversive performance improvements in many areas, but the effectiveness of deep learning methods for SIPs prediction has not been verified.

Results

We developed a deep learning model for predicting SIPs by constructing a Stacked Long Short-Term Memory (SLSTM) neural network that contains “dropout”. We extracted features from protein sequences using a novel feature extraction scheme that combined Zernike Moments (ZMs) with Position Specific Weight Matrix (PSWM). The capability of the proposed approach was assessed on S.erevisiae and Human SIPs datasets. The result indicates that the approach based on deep learning can effectively resist data skew and achieve good accuracies of 95.69 and 97.88%, respectively. To demonstrate the progressiveness of deep learning, we compared the results of the SLSTM-based method and the celebrated Support Vector Machine (SVM) method and several other well-known methods on the same datasets.

Conclusion

The results show that our method is overall superior to any of the other existing state-of-the-art techniques. As far as we know, this study first applies deep learning method to predict SIPs, and practical experimental results reveal its potential in SIPs identification.

相似文献

11.

Mechanisms of genome instability in Hutchinson-Gilford progeria

Haoyue Zhang Kan Cao 《生物学前沿》2017,12(1):49-62

Background

Hutchinson-Gilford progeria syndrome (HGPS) is a devastating premature aging disorder. It arises from a single point mutation in the LMNA gene. This mutation stimulates an aberrant splicing event and produces progerin, an isoform of the lamin A protein. Accumulation of progerin disrupts numerous physiological pathways and induces defects in nuclear architecture, gene expression, histone modification, cell cycle regulation, mitochondrial functionality, genome integrity and much more.

Objective

Among these phenotypes, genomic instability is tightly associated with physiological aging and considered a main contributor to the premature aging phenotypes. However, our understanding of the underlying molecular mechanisms of progerin-caused genome instability is far from clear.

Results and Conclusion

In this review, we summarize some of the recent findings and discuss potential mechanisms through which, progerin affects DNA damage repair and leads to genome integrity.

相似文献

12.

Pathway-based analyses

Jack W. KentJr 《BMC genetics》2016,17(Z2):S5

Background

New technologies for acquisition of genomic data, while offering unprecedented opportunities for genetic discovery, also impose severe burdens of interpretation andpenalties for multiple testing.

Methods

The Pathway-based Analyses Group of the Genetic Analysis Workshop 19 (GAW19) sought reduction of multiple-testing burden through various approaches to aggregation of highdimensional data in pathways informed by prior biological knowledge.

Results

Experimental methods testedincluded the use of "synthetic pathways" (random sets of genes) to estimate power and false-positive error rate of methods applied to simulated data; data reduction via independent components analysis, single-nucleotide polymorphism (SNP)-SNP interaction, and use of gene sets to estimate genetic similarity; and general assessment of the efficacy of prior biological knowledge to reduce the dimensionality of complex genomic data.

Conclusions

The work of this group explored several promising approaches to managing high-dimensional data, with the caveat that these methods are necessarily constrained by the quality of external bioinformatic annotation.

相似文献

13.

Deep learning architectures for prediction of nucleosome positioning from sequences data

Mattia Di Gangi Giosuè Lo Bosco Riccardo Rizzo 《BMC bioinformatics》2018,19(14):418

Background

Nucleosomes are DNA-histone complex, each wrapping about 150 pairs of double-stranded DNA. Their function is fundamental for one of the primary functions of Chromatin i.e. packing the DNA into the nucleus of the Eukaryote cells. Several biological studies have shown that the nucleosome positioning influences the regulation of cell type-specific gene activities. Moreover, computational studies have shown evidence of sequence specificity concerning the DNA fragment wrapped into nucleosomes, clearly underlined by the organization of particular DNA substrings. As the main consequence, the identification of nucleosomes on a genomic scale has been successfully performed by computational methods using a sequence features representation.

Results

In this work, we propose a deep learning model for nucleosome identification. Our model stacks convolutional layers and Long Short-term Memories to automatically extract features from short- and long-range dependencies in a sequence. Using this model we are able to avoid the feature extraction and selection steps while improving the classification performances.

Conclusions

Results computed on eleven data sets of five different organisms, from Yeast to Human, show the superiority of the proposed method with respect to the state of the art recently presented in the literature.

相似文献

14.

DL-ADR: a novel deep learning model for classifying genomic variants into adverse drug reactions

Zhaohui Liang Jimmy Xiangji Huang Xing Zeng Gang Zhang 《BMC medical genomics》2016,9(2):48

Background

Genomic variations are associated with the metabolism and the occurrence of adverse reactions of many therapeutic agents. The polymorphisms on over 2000 locations of cytochrome P450 enzymes (CYP) due to many factors such as ethnicity, mutations, and inheritance attribute to the diversity of response and side effects of various drugs. The associations of the single nucleotide polymorphisms (SNPs), the internal pharmacokinetic patterns and the vulnerability of specific adverse reactions become one of the research interests of pharmacogenomics. The conventional genomewide association studies (GWAS) mainly focuses on the relation of single or multiple SNPs to a specific risk factors which are a one-to-many relation. However, there are no robust methods to establish a many-to-many network which can combine the direct and indirect associations between multiple SNPs and a serial of events (e.g. adverse reactions, metabolic patterns, prognostic factors etc.). In this paper, we present a novel deep learning model based on generative stochastic networks and hidden Markov chain to classify the observed samples with SNPs on five loci of two genes (CYP2D6 and CYP1A2) respectively to the vulnerable population of 14 types of adverse reactions.

Methods

A supervised deep learning model is proposed in this study. The revised generative stochastic networks (GSN) model with transited by the hidden Markov chain is used. The data of the training set are collected from clinical observation. The training set is composed of 83 observations of blood samples with the genotypes respectively on CYP2D6*2, *10, *14 and CYP1A2*1C, *1 F. The samples are genotyped by the polymerase chain reaction (PCR) method. A hidden Markov chain is used as the transition operator to simulate the probabilistic distribution. The model can perform learning at lower cost compared to the conventional maximal likelihood method because the transition distribution is conditional on the previous state of the hidden Markov chain. A least square loss (LASSO) algorithm and a k-Nearest Neighbors (kNN) algorithm are used as the baselines for comparison and to evaluate the performance of our proposed deep learning model.

Results

There are 53 adverse reactions reported during the observation. They are assigned to 14 categories. In the comparison of classification accuracy, the deep learning model shows superiority over the LASSO and kNN model with a rate over 80 %. In the comparison of reliability, the deep learning model shows the best stability among the three models.

Conclusions

Machine learning provides a new method to explore the complex associations among genomic variations and multiple events in pharmacogenomics studies. The new deep learning algorithm is capable of classifying various SNPs to the corresponding adverse reactions. We expect that as more genomic variations are added as features and more observations are made, the deep learning model can improve its performance and can act as a black-box but reliable verifier for other GWAS studies.

相似文献

15.

Eigenvector metabolite analysis reveals dietary effects on the association among metabolite correlation patterns,gene expression,and phenotypes

Clare H. Scott Chialvo Ronglin Che David Reif Alison Motsinger-Reif Laura K. Reed 《Metabolomics : Official journal of the Metabolomic Society》2016,12(11):167

相似文献

16.

Metabolic response of porcine colon explants to in vitro infection by <Emphasis Type="Italic">Brachyspira hyodysenteriae</Emphasis>: a leap into disease pathophysiology

Thijs Welle Anna T. Hoekstra Ineke A. J. J. M. Daemen Celia R. Berkers Matheus O. Costa 《Metabolomics : Official journal of the Metabolomic Society》2017,13(7):83

Introduction

Swine dysentery caused by Brachyspira hyodysenteriae is a production limiting disease in pig farming. Currently antimicrobial therapy is the only treatment and control method available.

Objective

The aim of this study was to characterize the metabolic response of porcine colon explants to infection by B. hyodysenteriae.

Methods

Porcine colon explants exposed to B. hyodysenteriae were analyzed for histopathological, metabolic and pro-inflammatory gene expression changes.

Results

Significant epithelial necrosis, increased levels of l-citrulline and IL-1α were observed on explants infected with B. hyodysenteriae.

Conclusions

The spirochete induces necrosis in vitro likely through an inflammatory process mediated by IL-1α and NO.

相似文献

17.

Resolution effects in reconstructing ancestral genomes

Chunfang Zheng Yuji Jeong Madisyn Gabrielle Turcotte David Sankoff 《BMC genomics》2018,19(2):100

Background

The reconstruction of ancestral genomes must deal with the problem of resolution, necessarily involving a trade-off between trying to identify genomic details and being overwhelmed by noise at higher resolutions.

Results

We use the median reconstruction at the synteny block level, of the ancestral genome of the order Gentianales, based on coffee, Rhazya stricta and grape, to exemplify the effects of resolution (granularity) on comparative genomic analyses.

Conclusions

We show how decreased resolution blurs the differences between evolving genomes, with respect to rate, mutational process and other characteristics.

相似文献

18.

Transcriptome map of mouse isochores

Stilianos Arhondakis Kimon Frousios Costas S Iliopoulos Solon P Pissis German Tischler Sophia Kossida 《BMC genomics》2011,12(1):511

相似文献

19.

Application of chicken microarrays for gene expression analysis in other avian species

Tamsyn M Crowley Volker R Haring Simon Burggraaf Robert J Moore 《BMC genomics》2009,10(Z2):S3

Background

With the threat of emerging infectious diseases such as avian influenza, whose natural hosts are thought to be a variety of wild water birds including duck, we are armed with very few genomic resources to investigate large scale immunological gene expression studies in avian species. Multiple options exist for conducting large gene expression studies in chickens and in this study we explore the feasibility of using one of these tools to investigate gene expression in other avian species.

Results

In this study we utilised a whole genome long oligonucleotide chicken microarray to assess the utility of cross species hybridisation (CSH). We successfully hybridised a number of different avian species to this array, obtaining reliable signals. We were able to distinguish ducks that were infected with avian influenza from uninfected ducks using this microarray platform. In addition, we were able to detect known chicken immunological genes in all of the hybridised avian species.

Conclusion

Cross species hybridisation using long oligonucleotide microarrays is a powerful tool to study the immune response in avian species with little available genomic information. The present study validated the use of the whole genome long oligonucleotide chicken microarray to investigate gene expression in a range of avian species.

相似文献

20.

Multi-target drug repositioning by bipartite block-wise sparse multi-task learning

Limin Li Xiao He Karsten Borgwardt 《BMC systems biology》2018,12(4):55

Background

Finding potential drug targets is a crucial step in drug discovery and development. Recently, resources such as the Library of Integrated Network-Based Cellular Signatures (LINCS) L1000 database provide gene expression profiles induced by various chemical and genetic perturbations and thereby make it possible to analyze the relationship between compounds and gene targets at a genome-wide scale. Current approaches for comparing the expression profiles are based on pairwise connectivity mapping analysis. However, this method makes the simple assumption that the effect of a drug treatment is similar to knocking down its single target gene. Since many compounds can bind multiple targets, the pairwise mapping ignores the combined effects of multiple targets, and therefore fails to detect many potential targets of the compounds.

Results

We propose an algorithm to find sets of gene knock-downs that induce gene expression changes similar to a drug treatment. Assuming that the effects of gene knock-downs are additive, we propose a novel bipartite block-wise sparse multi-task learning model with super-graph structure (BBSS-MTL) for multi-target drug repositioning that overcomes the restrictive assumptions of connectivity mapping analysis.

Conclusions

The proposed method BBSS-MTL is more accurate for predicting potential drug targets than the simple pairwise connectivity mapping analysis on five datasets generated from different cancer cell lines.

Availability

The code can be obtained at http://gr.xjtu.edu.cn/web/liminli/codes.

相似文献