首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Min  Xu  Zeng  Wanwen  Chen  Shengquan  Chen  Ning  Chen  Ting  Jiang  Rui 《BMC bioinformatics》2017,18(13):478-46

Background

With the rapid development of deep sequencing techniques in the recent years, enhancers have been systematically identified in such projects as FANTOM and ENCODE, forming genome-wide landscapes in a series of human cell lines. Nevertheless, experimental approaches are still costly and time consuming for large scale identification of enhancers across a variety of tissues under different disease status, making computational identification of enhancers indispensable.

Results

To facilitate the identification of enhancers, we propose a computational framework, named DeepEnhancer, to distinguish enhancers from background genomic sequences. Our method purely relies on DNA sequences to predict enhancers in an end-to-end manner by using a deep convolutional neural network (CNN). We train our deep learning model on permissive enhancers and then adopt a transfer learning strategy to fine-tune the model on enhancers specific to a cell line. Results demonstrate the effectiveness and efficiency of our method in the classification of enhancers against random sequences, exhibiting advantages of deep learning over traditional sequence-based classifiers. We then construct a variety of neural networks with different architectures and show the usefulness of such techniques as max-pooling and batch normalization in our method. To gain the interpretability of our approach, we further visualize convolutional kernels as sequence logos and successfully identify similar motifs in the JASPAR database.

Conclusions

DeepEnhancer enables the identification of novel enhancers using only DNA sequences via a highly accurate deep learning model. The proposed computational framework can also be applied to similar problems, thereby prompting the use of machine learning methods in life sciences.
  相似文献   

2.
Back-propagation, feed-forward neural networks are used to predict the secondary structures of membrane proteins whose structures are known to atomic resolution. These networks are trained on globular proteins and can predict globular protein structures having no homology to those of the training set with correlation coefficients (C) of 0.45, 0.32 and 0.43 for a-helix, -strand and random coil structures, respectively. When tested on membrane proteins, neural networks trained on globular proteins do, on average, correctly predict (Qi) 62%, 38% and 69% of the residues in the -helix, -strand and random coil structures. These scores rank higher than those obtained with the currently used statistical methods and are comparable to those obtained with the joint approaches tested so far on membrane proteins. The lower success score for -strand as compared to the other structures suggests that the sample of -strand patterns contained in the training set is less representative than those of a-helix and random coil. Our analysis, which includes the effects of the network parameters and of the structural composition of the training set on the prediction, shows that regular patterns of secondary structures can be successfully extrapolated from globular to membrane proteins. Correspondence to: R. Casadio  相似文献   

3.
Background: In the human genome, distal enhancers are involved in regulating target genes through proximal promoters by forming enhancer-promoter interactions. Although recently developed high-throughput experimental approaches have allowed us to recognize potential enhancer-promoter interactions genome-wide, it is still largely unclear to what extent the sequence-level information encoded in our genome help guide such interactions. Methods: Here we report a new computational method (named “SPEID”) using deep learning models to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given. Results: Our results across six different cell types demonstrate that SPEID is effective in predicting enhancer-promoter interactions as compared to state-of-the-art methods that only use information from a single cell type. As a proof-of-principle, we also applied SPEID to identify somatic non-coding mutations in melanoma samples that may have reduced enhancer-promoter interactions in tumor genomes. Conclusions: This work demonstrates that deep learning models can help reveal that sequence-based features alone are sufficient to reliably predict enhancer-promoter interactions genome-wide.  相似文献   

4.
The back-propagation neural network algorithm is a commonly used method for predicting the secondary structure of proteins. Whilst popular, this method can be slow to learn and here we compare it with an alternative: the cascade-correlation architecture. Using a constructive algorithm, cascade-correlation achieves predictive accuracies comparable to those obtained by back-propagation, in shorter time.  相似文献   

5.
《Biophysical journal》2022,121(20):3883-3895
One of the fundamental limitations of accurately modeling biomolecules like DNA is the inability to perform quantum chemistry calculations on large molecular structures. We present a machine learning model based on an equivariant Euclidean neural network framework to obtain accurate ab initio electron densities for arbitrary DNA structures that are much too large for conventional quantum methods. The model is trained on representative B-DNA basepair steps that capture both base pairing and base stacking interactions. The model produces accurate electron densities for arbitrary B-DNA structures with typical errors of less than 1%. Crucially, the error does not increase with system size, which suggests that the model can extrapolate to large DNA structures with negligible loss of accuracy. The model also generalizes reasonably to other DNA structural motifs such as the A- and Z-DNA forms, despite being trained on only B-DNA configurations. The model is used to calculate electron densities of several large-scale DNA structures, and we show that the computational scaling for this model is essentially linear. We also show that this machine learning electron density model can be used to calculate accurate electrostatic potentials for DNA. These electrostatic potentials produce more accurate results compared with classical force fields and do not show the usual deficiencies at short range.  相似文献   

6.
7.
8.
The interest in studying metabolic alterations in cancer and their potential role as novel targets for therapy has been rejuvenated in recent years. Here, we report the development of the first genome‐scale network model of cancer metabolism, validated by correctly identifying genes essential for cellular proliferation in cancer cell lines. The model predicts 52 cytostatic drug targets, of which 40% are targeted by known, approved or experimental anticancer drugs, and the rest are new. It further predicts combinations of synthetic lethal drug targets, whose synergy is validated using available drug efficacy and gene expression measurements across the NCI‐60 cancer cell line collection. Finally, potential selective treatments for specific cancers that depend on cancer type‐specific downregulation of gene expression and somatic mutations are compiled.  相似文献   

9.
10.
11.
Gurbuz  Hasan  Kivrak  Ersin  Soyupak  Selcuk  Yerli  Sedat V. 《Hydrobiologia》2003,498(1-3):133-141
A 14.6 m long profile from the northern part of the Hulun lake, the furthest north of the large lakes of China, has provided a sedimentary and diatom record since the late Glacial. The chronological sequence was established based on 10 radiocarbon dates. Sedimentological study and diatom analysis are synthesized for the reconstruction of the history of lake-level changes. The results show that the Hulun basin was not occupied by a lake during the Last Glaciation. A rapid transition to a deep lake occurred since 12850 yr B.P., and this high level phase lasted to 11200 yr B.P., although there existed several subordinate lake level fluctuations. An abrupt lake level drop and dry climatic conditions occurred during 11200–10600 yr B.P. The lake became deeper again from 10600 yr B.P. to 10300 yr B.P. Hulun lake at the early Holocene was characterized by the low lake-level, and the lake level rose again in 7200–5800 yr B.P., though the lake-levels changed quite variably. A dry condition occurred and lake level declined again during 5800–3000 yr B.P. The presence of the palaeosol on the top of this profile indicates the persistence of low lake levels after 3000 yr B.P. The comparison with the other lake-level records from northern China has suggested that the Hulun Lake shows a different lake level history from the lakes in monsoon areas.  相似文献   

12.
13.
Clustering with neural networks   总被引:3,自引:0,他引:3  
Partitioning a set ofN patterns in ad-dimensional metric space intoK clusters — in a way that those in a given cluster are more similar to each other than the rest — is a problem of interest in many fields, such as, image analysis, taxonomy, astrophysics, etc. As there are approximatelyK N/K! possible ways of partitioning the patterns amongK clusters, finding the best solution is beyond exhaustive search whenN is large. We show that this problem, in spite of its exponential complexity, can be formulated as an optimization problem for which very good, but not necessarily optimal, solutions can be found by using a Hopfield model of neural networks. To obtain a very good solution, the network must start from many randomly selected initial states. The network is simulated on the MPP, a 128 × 128 SIMD array machine, where we use the massive parallelism not only in solving the differential equations that govern the evolution of the network, but also in starting the network from many initial states at once thus obtaining many solutions in one run. We achieve speedups of two to three orders of magnitude over serial implementations and the promise through Analog VLSI implementations of further speedups of three to six orders of magnitude.Supported by a National Research Council-NASA Research Associatship  相似文献   

14.
Spatial prediction needs to account for spatial information, which makes conventional radial basis function (RBF) networks inappropriate, for they assume independent and identical distribution. In this paper, we fuse spatial information at different layers of RBF. Experiments show fusion at hidden layer gives the best result and suggest that the optimal value is around one for the coefficient, which is used in the linear combination at the output layer.  相似文献   

15.
 For individuals with paraplegia, standing up requires activation of paralyzed leg muscles by an artificial functional electrical stimulation (FES) controller and voluntary control of arm forces by the individual. Any knowledge of such voluntary control, particularly its prediction, could be used to design more effective FES controllers. Therefore, artificial neural network models were developed to predict voluntary arm forces from measured angular positions of the ankle, knee, and hip joints during FES-assisted standing up in paraplegia. The training data were collected from eight paraplegic subjects in repeated standing-up trials, and divided into two categories for training and validation. The predictions of the models closely followed both the training and validation data, showing good accuracy and generalization. The comparison of the models showed that, although there are striking similarities among the voluntary controls adopted by different subjects, each subject develops his/her own `personal strategy' to control the arm forces, which is consistent from trial to trial. The level of consistency was dependent on the experience in using FES, injury level, body weight, and other subject-specific parameters. Received: 5 January 1999 / Accepted in revised form: 29 January 2001  相似文献   

16.
17.
The purpose of this study was to investigate strategies in the monotherapy treatment of HIV infection in the presence of drug-resistant (mutant) strains. A mathematical system is developed to model resistance in HIV chemotherapy. It includes the key players in the immune response to HIV infection: virus and both uninfected CD4+ and infected CD4+ T-cell populations. We model the latent and progressive stages of the disease, and then introduce monotherapy treatment. The model is a system of differential equations describing the interaction of two distinct classes of HIV—drug-sensitive (wild type) and drug-resistant (mutant)—with lymphocytes in the peripheral blood. We then introduce chemotherapy effects. In the absence of treatment, the model produces the three types of qualitative clinical behavior—anuninfected steady state, andinfected steady state (latency), andprogression to AIDS. Simulation of treatment is provided for monotherapy, during theprogression to AIDS state, in the consideration of resistance effects. Treatment benefit is based on an increase or retention in CD4+ T-cell counts together with a low viral titer. We explore the following treatment approaches: an antiviral drug which reduces viral infectivity that is administered early—when the CD4+ T-cell count is ≥300/mm3, and late—when the CD4+ T-cell count is less than 300/mm3. We compare all results with data. When treatment is initiated during the progression to AIDS state, treatment prevents T-cell collapse, but gradually loses effectiveness due to drug resistance. We hypothesize that it is the careful balance of mutant and wild-type HIV strains which provides the greatest prolonged benefit from treatment. This is best achieved when treatment is initiated when the CD4+ T-cell counts are greater than 250/mm3, but less than 400/mm3 in this model (i.e. not too early, not too late). These results are supported by clinical data. The work is novel in that it is the first model to accurately simultate data before, during and after monotherapy treatment. Our model also provides insight into recent clinical results, as well as suggests plausible guidelines for clinical testing in the monotherapy of HIV infection.  相似文献   

18.
Rapid enzymatic test for phenotypic HIV protease drug resistance   总被引:1,自引:0,他引:1  
A phenotypic resistance test based on recombinant expression of the active HIV protease in E. coli from patient blood samples was developed. The protease is purified in a rapid one-step procedure as active enzyme and tested for inhibition by five selected synthetic inhibitors (amprenavir, indinavir, nelfinavir, ritonavir, and saquinavir) used presently for chemotherapy of HIV-infected patients. The HPLC system used in a previous approach was replaced by a continuous fluorogenic assay suitable for high-throughput screening on microtiter plates. This reduces significantly the total assay time and allows the determination of inhibition constants (Ki). The Michaelis constant (Km) and the inhibition constant (Ki) of recombinant wild-type protease agree well with published data for cloned HIV protease. The enzymatic test was evaluated with recombinant HIV protease derived from eight HIV-positive patients scored from 'sensitive' to 'highly resistant' according to mutations detected by genotypic analysis. The measured Ki values correlate well with the genotypic resistance scores, but allow a higher degree of differentiation. The non-infectious assay enables a more rapid yet sensitive detection of HIV protease resistance than other phenotypic assays.  相似文献   

19.

Background  

In HIV treatment it is critical to have up-to-date resistance data of applicable drugs since HIV has a very high rate of mutation. These data are made available through scientific publications and must be extracted manually by experts in order to be used by virologists and medical doctors. Therefore there is an urgent need for a tool that partially automates this process and is able to retrieve relations between drugs and virus mutations from literature.  相似文献   

20.
Designing protein sequences that can fold into a given structure is a well‐known inverse protein‐folding problem. One important characteristic to attain for a protein design program is the ability to recover wild‐type sequences given their native backbone structures. The highest average sequence identity accuracy achieved by current protein‐design programs in this problem is around 30%, achieved by our previous system, SPIN. SPIN is a program that predicts sequences compatible with a provided structure using a neural network with fragment‐based local and energy‐based nonlocal profiles. Our new model, SPIN2, uses a deep neural network and additional structural features to improve on SPIN. SPIN2 achieves over 34% in sequence recovery in 10‐fold cross‐validation and independent tests, a 4% improvement over the previous version. The sequence profiles generated from SPIN2 are expected to be useful for improving existing fold recognition and protein design techniques. SPIN2 is available at http://sparks-lab.org .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号