首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Min  Xu  Zeng  Wanwen  Chen  Shengquan  Chen  Ning  Chen  Ting  Jiang  Rui 《BMC bioinformatics》2017,18(13):478-46

Background

With the rapid development of deep sequencing techniques in the recent years, enhancers have been systematically identified in such projects as FANTOM and ENCODE, forming genome-wide landscapes in a series of human cell lines. Nevertheless, experimental approaches are still costly and time consuming for large scale identification of enhancers across a variety of tissues under different disease status, making computational identification of enhancers indispensable.

Results

To facilitate the identification of enhancers, we propose a computational framework, named DeepEnhancer, to distinguish enhancers from background genomic sequences. Our method purely relies on DNA sequences to predict enhancers in an end-to-end manner by using a deep convolutional neural network (CNN). We train our deep learning model on permissive enhancers and then adopt a transfer learning strategy to fine-tune the model on enhancers specific to a cell line. Results demonstrate the effectiveness and efficiency of our method in the classification of enhancers against random sequences, exhibiting advantages of deep learning over traditional sequence-based classifiers. We then construct a variety of neural networks with different architectures and show the usefulness of such techniques as max-pooling and batch normalization in our method. To gain the interpretability of our approach, we further visualize convolutional kernels as sequence logos and successfully identify similar motifs in the JASPAR database.

Conclusions

DeepEnhancer enables the identification of novel enhancers using only DNA sequences via a highly accurate deep learning model. The proposed computational framework can also be applied to similar problems, thereby prompting the use of machine learning methods in life sciences.
  相似文献   

2.
3.

One fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important applications in both protein engineering and evolutionary biology. Here, we ask whether 3D convolutional neural networks (3D CNNs) can learn the local fitness landscape of protein structure to reliably predict either the wild-type amino acid or the consensus in a multiple sequence alignment from the local structural context surrounding site of interest. We find that the network can predict wild type with good accuracy, and that network confidence is a reliable measure of whether a given prediction is likely going to be correct or not. Predictions of consensus are less accurate and are primarily driven by whether or not the consensus matches the wild type. Our work suggests that high-confidence mis-predictions of the wild type may identify sites that are primed for mutation and likely targets for protein engineering.

  相似文献   

4.
Co-evolutionary models such as direct coupling analysis (DCA) in combination with machine learning (ML) techniques based on deep neural networks are able to predict accurate protein contact or distance maps. Such information can be used as constraints in structure prediction and massively increase prediction accuracy. Unfortunately, the same ML methods cannot readily be applied to RNA as they rely on large structural datasets only available for proteins. Here, we demonstrate how the available smaller data for RNA can be used to improve prediction of RNA contact maps. We introduce an algorithm called CoCoNet that is based on a combination of a Coevolutionary model and a shallow Convolutional Neural Network. Despite its simplicity and the small number of trained parameters, the method boosts the positive predictive value (PPV) of predicted contacts by about 70% with respect to DCA as tested by cross-validation of about eighty RNA structures. However, the direct inclusion of the CoCoNet contacts in 3D modeling tools does not result in a proportional increase of the 3D RNA structure prediction accuracy. Therefore, we suggest that the field develops, in addition to contact PPV, metrics which estimate the expected impact for 3D structure modeling tools better. CoCoNet is freely available and can be found at https://github.com/KIT-MBS/coconet.  相似文献   

5.
The local environment and land usages have changed a lot during the past one hundred years. Historical documents and materials are crucial in understanding and following these changes. Historical documents are, therefore, an important piece in the understanding of the impact and consequences of land usage change. This, in turn, is important in the search of restoration projects that can be conducted to turn and reduce harmful and unsustainable effects originating from changes in the land-usage.This work extracts information on the historical location and geographical distribution of wetlands, from hand-drawn maps. This is achieved by using deep learning (DL), and more specifically a convolutional neural network (CNN). The CNN model is trained on a manually pre-labelled dataset on historical wetlands in the area of Jönköping county in Sweden. These are all extracted from the historical map called “Generalstabskartan”.The presented CNN performs well and achieves a F1-score of 0.886 when evaluated using a 10-fold cross validation over the data. The trained models are additionally used to generate a GIS layer of the presumable historical geographical distribution of wetlands for the area that is depicted in the southern collection in Generalstabskartan, which covers the southern half of Sweden. This GIS layer is released as an open resource and can be freely used.To summarise, the presented results show that CNNs can be a useful tool in the extraction and digitalisation of non-textual information in historical documents, such as historical maps. A modern GIS material that can be used to further understand the past land-usage change is produced within this research. Previously, no material of this detail and extent have been available, due to the large effort needed to manually create such. However, with the presented resource better quantifications and estimations of historical wetlands that have been lost can be made.  相似文献   

6.

Coral reef research and management efforts can be improved when supported by reef maps providing local-scale details across global extents. However, such maps are difficult to generate due to the broad geographic range of coral reefs, the complexities of relating satellite imagery to geomorphic or ecological realities, and other challenges. However, reef extent maps are one of the most commonly used and most valuable data products from the perspective of reef scientists and managers. Here, we used convolutional neural networks to generate a globally consistent coral reef probability map—a probabilistic estimate of the geospatial extent of reef ecosystems—to facilitate scientific, conservation, and management efforts. We combined a global mosaic of high spatial resolution Planet Dove satellite imagery with regional Millennium Coral Reef Mapping Project reef extents to build training, validation, and application datasets. These datasets trained our reef extent prediction model, a neural network with a dense-unet architecture followed by a random forest classifier, which was used to produce a global coral reef probability map. Based on this probability map, we generated a global coral reef extent map from a 60% threshold of reef probability (reef: probability ≥ 60%, non-reef: probability < 60%). Our findings provide a proof-of-concept method for global reef extent estimates using a consistent and readily updateable methodology that leverages modern deep learning approaches to support downstream users. These maps are openly-available through the Allen Coral Atlas.

  相似文献   

7.
Fish species recognition is an important task to preserve ecosystems, feed humans, and tourism. In particular, the Pantanal is a wetland region that harbors hundreds of species and is considered one of the most important ecosystems in the world. In this paper, we present a new method based on convolutional neural networks (CNNs) for Pantanal fish species recognition. A new CNN composed of three branches that classify the fish species, family and order is proposed with the aim of improving the recognition of species with similar characteristics. The branch that classifies the fish species uses information learned from the family and order, which has shown to improve the overall accuracy. Results on unrestricted image dataset showed that the proposed method provides superior results to traditional approaches. Our method obtained an accuracy of 0.873 versus 0.864 of traditional CNN in recognition of 68 fish species. In addition, our method provides fish family and order recognition, which obtained accuracies of 0.938 and 0.96, respectively. We hope that, with these promising results, an automatic tool can be developed to monitor species in an important region such as the Pantanal.  相似文献   

8.
The importance of T cells in immunotherapy has motivated developing technologies to improve therapeutic efficacy. One objective is assessing antigen‐induced T cell activation because only functionally active T cells are capable of killing the desired targets. Autofluorescence imaging can distinguish T cell activity states in a non‐destructive manner by detecting endogenous changes in metabolic co‐enzymes such as NAD(P)H. However, recognizing robust activity patterns is computationally challenging in the absence of exogenous labels. We demonstrate machine learning methods that can accurately classify T cell activity across human donors from NAD(P)H intensity images. Using 8260 cropped single‐cell images from six donors, we evaluate classifiers ranging from traditional models that use previously‐extracted image features to convolutional neural networks (CNNs) pre‐trained on general non‐biological images. Adapting pre‐trained CNNs for the T cell activity classification task provides substantially better performance than traditional models or a simple CNN trained with the autofluorescence images alone. Visualizing the images with dimension reduction provides intuition into why the CNNs achieve higher accuracy than other approaches. Our image processing and classifier training software is available at https://github.com/gitter‐lab/t‐cell‐classification .  相似文献   

9.
Lauraceae and Fagaceae are two large woody plant families that are predominant in the low- and middle-altitude regions in Taiwan. The highly interspecific similarity between some species of the family brings limitations on the management and utilization. This work proposed an approach for identifying 15 Lauraceae species and 20 Fagaceae species using leaf images and convolutional neural networks (CNNs). Leaf specimens of 35 species were collected from the northern, central, and southern parts of Taiwan. Images of the leaves were acquired using flat-bed scanners. Three CNN architectures—DenseNet-121, MobileNet V2, and Xception—were trained. Xception achieved the highest mean test accuracy of 99.39%, and MobileNet V2 required the shortest mean test time of 17.1 ms per image using a GPU. The saliency maps revealed that the characteristics learned by models matched the leaf features used by botanists. A pruning algorithm, gate decorator, was applied to the trained models for reducing the number of parameters and number of floating-point operations of the MobileNet V2 by 55.4% and 69.1%, respectively, while the model accuracy was maintained at 92.03%. Thus, MobileNet V2 has the potential to be used for identifying the Lauraceae and Fagaceae species on mobile devices.  相似文献   

10.
Pest infestation is a major cause of crop damage and lost revenues worldwide. Automatic identification of invasive insects would significantly speed up the recognition of pests and expedite their removal. In this paper, we generated ensembles of CNNs based on different topologies (EfficientNetB0, ResNet50, GoogleNet, ShuffleNet, MobileNetv2, and DenseNet201) optimized with different Adam variants for pest identification. Two new Adam algorithms for deep network optimization based on DGrad are proposed that introduce a scaling factor in the learning rate. Six CNN architectures that vary in their optimization function were trained on the Deng (SMALL), large IP102, and Xie2 (D0) pest data sets. Ensembles were compared and evaluated using several performance indicators. The best performing ensemble, which combined the CNNs using the different Adam variants, including the new ones proposed here, competed with human expert classifications on the Deng data set and achieved state of the art on all three insect data sets: 95.52% on Deng, 74.11% on IP102, and 99.81% on Xie2. Additional tests were performed on data sets for medical imagery classification that further validated the robustness and power of the proposed Adam optimization variants. All MATLAB source code is available at https://github.com/LorisNanni/.  相似文献   

11.
Abstract

For high accuracy classification of DNA sequences through Convolutional Neural Networks (CNNs), it is essential to use an efficient sequence representation that can accelerate similarity comparison between DNA sequences. In addition, CNN networks can be improved by avoiding the dimensionality problem associated with multi-layer CNN features. This paper presents a new approach for classification of bacterial DNA sequences based on a custom layer. A CNN is used with Frequency Chaos Game Representation (FCGR) of DNA. The FCGR is adopted as a sequence representation method with a suitable choice of the frequency k-lengthen words occurrence in DNA sequences. The DNA sequence is mapped using FCGR that produces an image of a gene sequence. This sequence displays both local and global patterns. A pre-trained CNN is built for image classification. First, the image is converted to feature maps through convolutional layers. This is sometimes followed by a down-sampling operation that reduces the spatial size of the feature map and removes redundant spatial information using the pooling layers. The Random Projection (RP) with an activation function, which carries data with a decent variety with some randomness, is suggested instead of the pooling layers. The feature reduction is achieved while keeping the high accuracy for classifying bacteria into taxonomic levels. The simulation results show that the proposed CNN based on RP has a trade-off between accuracy score and processing time.  相似文献   

12.
Deep learning is a powerful approach for distinguishing classes of images, and there is a growing interest in applying these methods to delimit species, particularly in the identification of mosquito vectors. Visual identification of mosquito species is the foundation of mosquito-borne disease surveillance and management, but can be hindered by cryptic morphological variation in mosquito vector species complexes such as the malaria-transmitting Anopheles gambiae complex. We sought to apply Convolutional Neural Networks (CNNs) to images of mosquitoes as a proof-of-concept to determine the feasibility of automatic classification of mosquito sex, genus, species, and strains using whole-body, 2D images of mosquitoes. We introduce a library of 1, 709 images of adult mosquitoes collected from 16 colonies of mosquito vector species and strains originating from five geographic regions, with 4 cryptic species not readily distinguishable morphologically even by trained medical entomologists. We present a methodology for image processing, data augmentation, and training and validation of a CNN. Our best CNN configuration achieved high prediction accuracies of 96.96% for species identification and 98.48% for sex. Our results demonstrate that CNNs can delimit species with cryptic morphological variation, 2 strains of a single species, and specimens from a single colony stored using two different methods. We present visualizations of the CNN feature space and predictions for interpretation of our results, and we further discuss applications of our findings for future applications in malaria mosquito surveillance.  相似文献   

13.
  1. Download : Download high-res image (229KB)
  2. Download : Download full-size image
  相似文献   

14.
有孔虫个体微小、数量众多、地理分布广、演化迅速, 是记录海洋沉积环境的重要载体, 在海相生物地层划分和对比中具有十分重要的作用。因有孔虫属种众多, 传统的属种鉴定需要经验丰富的专业人员进行人工鉴定且耗时较长, 此外人工鉴定古生物面临人才匮乏和工作量大等问题。卷积神经网络在计算机视觉领域的应用可较好的解决上述问题。利用古生物专家对中新世浮游有孔虫化石标注为指导, 根据有孔虫化石不同方向的视角分类, 结合卷积神经网络算法, 开发了有孔虫化石图像识别系统。研究发现, 通过有孔虫化石腹视、缘视和背视角度分类, 采取两级分段式鉴定算法对中新世浮游有孔虫属一级进行识别, 属一级鉴定准确率达到82%左右。  相似文献   

15.
Background: Quantitative analysis of mitochondrial morphology plays important roles in studies of mitochondrial biology. The analysis depends critically on segmentation of mitochondria, the image analysis process of extracting mitochondrial morphology from images. The main goal of this study is to characterize the performance of convolutional neural networks (CNNs) in segmentation of mitochondria from fluorescence microscopy images. Recently, CNNs have achieved remarkable success in challenging image segmentation tasks in several disciplines. So far, however, our knowledge of their performance in segmenting biological images remains limited. In particular, we know little about their robustness, which defines their capability of segmenting biological images of different conditions, and their sensitivity, which defines their capability of detecting subtle morphological changes of biological objects. Methods: We have developed a method that uses realistic synthetic images of different conditions to characterize the robustness and sensitivity of CNNs in segmentation of mitochondria. Using this method, we compared performance of two widely adopted CNNs: the fully convolutional network (FCN) and the U-Net. We further compared the two networks against the adaptive active-mask (AAM) algorithm, a representative of high-performance conventional segmentation algorithms. Results: The FCN and the U-Net consistently outperformed the AAM in accuracy, robustness, and sensitivity, often by a significant margin. The U-Net provided overall the best performance. Conclusions: Our study demonstrates superior performance of the U-Net and the FCN in segmentation of mitochondria. It also provides quantitative measurements of the robustness and sensitivity of these networks that are essential to their applications in quantitative analysis of mitochondrial morphology.  相似文献   

16.
This paper investigates finite-time synchronization of an array of coupled neural networks via discontinuous controllers. Based on Lyapunov function method and the discontinuous version of finite-time stability theory, some sufficient criteria for finite-time synchronization are obtained. Furthermore, we propose switched control and adaptive tuning parameter strategies in order to reduce the settling time. In addition, pinning control scheme via a single controller is also studied in this paper. With the hypothesis that the coupling network topology contains a directed spanning tree and each of the strongly connected components is detail-balanced, we prove that finite-time synchronization can be achieved via pinning control. Finally, some illustrative examples are given to show the validity of the theoretical results.  相似文献   

17.
The pathway for novel lead drug discovery has many major deficiencies, the most significant of which is the immense size of small molecule diversity space. Methods that increase the search efficiency and/or reduce the size of the search space, increase the rate at which useful lead compounds are identified. Artificial neural networks optimized via evolutionary computation provide a cost and time-effective solution to this problem. Here, we present results that suggest preclustering of small molecules prior to neural network optimization is useful for generating models of quantitative structure-activity relationships for a set of HIV inhibitors. Using these methods, it is possible to prescreen compounds to separate active from inactive compounds or even actives and mildly active compounds from inactive compounds with high predictive accuracy while simultaneously reducing the feature space. It is also possible to identify "human interpretable" features from the best models that can be used for proposal and synthesis of new compounds in order to optimize potency and specificity.  相似文献   

18.
The electric sense combines spatial aspects of vision and touch with temporal features of audition. Its accessible neural architecture shares similarities with mammalian sensory systems and allows for recordings from successive brain areas to test hypotheses about neural coding. Further, electrosensory stimuli encountered during prey capture, navigation, and communication, can be readily synthesized in the laboratory. These features enable analyses of the neural circuitry that reveal general principles of encoding and decoding, such as segregation of information into separate streams and neural response sparsification. A systems level understanding arises via linkage between cellular differentiation and network architecture, revealed by in vitro and in vivo analyses, while computational modeling reveals how single cell dynamics and connectivity shape the sparsification process.  相似文献   

19.
Intrinsically disordered regions (IDR) play an important role in key biological processes and are closely related to human diseases. IDRs have great potential to serve as targets for drug discovery, most notably in disordered binding regions. Accurate prediction of IDRs is challenging because their genome wide occurrence and a low ratio of disordered residues make them difficult targets for traditional classification techniques. Existing computational methods mostly rely on sequence profiles to improve accuracy which is time consuming and computationally expensive. This article describes an ab initio sequence-only prediction method—which tries to overcome the challenge of accurate prediction posed by IDRs—based on reduced amino acid alphabets and convolutional neural networks (CNNs). We experiment with six different 3-letter reduced alphabets. We argue that the dimensional reduction in the input alphabet facilitates the detection of complex patterns within the sequence by the convolutional step. Experimental results show that our proposed IDR predictor performs at the same level or outperforms other state-of-the-art methods in the same class, achieving accuracy levels of 0.76 and AUC of 0.85 on the publicly available Critical Assessment of protein Structure Prediction dataset (CASP10). Therefore, our method is suitable for proteome-wide disorder prediction yielding similar or better accuracy than existing approaches at a faster speed.  相似文献   

20.
Remote sensing images obtained by unoccupied aircraft systems (UAS) across different seasons enabled capturing of species-specific phenological patterns of tropical trees. The application of UAS multi-season images to classify tropical tree species is still poorly understood. In this study, we used RGB images from different seasons obtained by a low-cost UAS and convolutional neural networks (CNNs) to map tree species in an Amazonian forest. Individual tree crowns (ITC) were outlined in the UAS images and identified to the species level using forest inventory data. The CNN model was trained with images obtained in February, May, August, and November. The classification accuracy in the rainy season (November and February) was higher than in the dry season (May and August). Fusing images from multiple seasons improved the average accuracy of tree species classification by up to 21.1 percentage points, reaching 90.5%. The CNN model can learn species-specific phenological characteristics that impact the classification accuracy, such as leaf fall in the dry season, which highlights its potential to discriminate species in various conditions. We produced high-quality individual tree crown maps of the species using a post-processing procedure. The combination of multi-season UAS images and CNNs has the potential to map tree species in the Amazon, providing valuable insights for forest management and conservation initiatives.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号