首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Advances in cryo-electron microscopy (cryo-EM) for high-resolution imaging of biomolecules in solution have provided new challenges and opportunities for algorithm development for 3D reconstruction. Next-generation volume reconstruction algorithms that combine generative modelling with end-to-end unsupervised deep learning techniques have shown promise, but many technical and theoretical hurdles remain, especially when applied to experimental cryo-EM images. In light of the proliferation of such methods, we propose here a critical review of recent advances in the field of deep generative modelling for cryo-EM reconstruction. The present review aims to (i) provide a unified statistical framework using terminology familiar to machine learning researchers with no specific background in cryo-EM, (ii) review the current methods in this framework, and (iii) outline outstanding bottlenecks and avenues for improvements in the field.  相似文献   

2.
Protein engineering seeks to identify protein sequences with optimized properties. When guided by machine learning, protein sequence generation methods can draw on prior knowledge and experimental efforts to improve this process. In this review, we highlight recent applications of machine learning to generate protein sequences, focusing on the emerging field of deep generative methods.  相似文献   

3.
药物从研发到临床应用需要耗费较长的时间,研发期间的投入成本可高达十几亿元。而随着医药研发与人工智能的结合以及生物信息学的飞速发展,药物活性相关数据急剧增加,传统的实验手段进行药物活性预测已经难以满足药物研发的需求。借助算法来辅助药物研发,解决药物研发中的各种问题能够大大推动药物研发进程。传统机器学习方法尤其是随机森林、支持向量机和人工神经网络在药物活性方面能够达到较高的预测精度。深度学习由于具有多层神经网络,模型可以接收高维的输入变量且不需要人工限定数据输入特征,可以拟合较为复杂的函数模型,应用于药物研发可以进一步提高各个环节的效率。在药物活性预测中应用较为广泛的深度学习模型主要是深度神经网络(deep neural networks,DNN)、循环神经网络(recurrent neural networks,RNN)和自编码器(auto encoder,AE),而生成对抗网络(generative adversarial networks,GAN)由于其生成数据的能力常常被用来和其他模型结合进行数据增强。近年来深度学习在药物分子活性预测方面的研究和应用综述表明,深度学习模型的准确度和效率均高于传统实验方法和传统机器学习方法。因此,深度学习模型有望成为药物研发领域未来十年最重要的辅助计算模型。  相似文献   

4.
Deep learning for computer vision has shown promising results in the field of entomology, however, there still remains untapped potential. Deep learning performance is enabled primarily by large quantities of annotated data which, outside of rare circumstances, are limited in ecological studies. Currently, to utilize deep learning systems, ecologists undergo extensive data collection efforts, or limit their problem to niche tasks. These solutions do not scale to region agnostic models. However, there are solutions that employ data augmentation, simulators, generative models, and self-supervised learning that can supplement limited labelled data. Here, we highlight the success of deep learning for computer vision within entomology, discuss data collection efforts, provide methodologies for optimizing learning from limited annotations, and conclude with practical guidelines for how to achieve a foundation model for entomology capable of accessible automated ecological monitoring on a global scale.  相似文献   

5.
6.
Machine learning or deep learning models have been widely used for taxonomic classification of metagenomic sequences and many studies reported high classification accuracy. Such models are usually trained based on sequences in several training classes in hope of accurately classifying unknown sequences into these classes. However, when deploying the classification models on real testing data sets, sequences that do not belong to any of the training classes may be present and are falsely assigned to one of the training classes with high confidence. Such sequences are referred to as out-of-distribution (OOD) sequences and are ubiquitous in metagenomic studies. To address this problem, we develop a deep generative model-based method, MLR-OOD, that measures the probability of a testing sequencing belonging to OOD by the likelihood ratio of the maximum of the in-distribution (ID) class conditional likelihoods and the Markov chain likelihood of the testing sequence measuring the sequence complexity. We compose three different microbial data sets consisting of bacterial, viral, and plasmid sequences for comprehensively benchmarking OOD detection methods. We show that MLR-OOD achieves the state-of-the-art performance demonstrating the generality of MLR-OOD to various types of microbial data sets. It is also shown that MLR-OOD is robust to the GC content, which is a major confounding effect for OOD detection of genomic sequences. In conclusion, MLR-OOD will greatly reduce false positives caused by OOD sequences in metagenomic sequence classification.  相似文献   

7.

Background

Genomic variations are associated with the metabolism and the occurrence of adverse reactions of many therapeutic agents. The polymorphisms on over 2000 locations of cytochrome P450 enzymes (CYP) due to many factors such as ethnicity, mutations, and inheritance attribute to the diversity of response and side effects of various drugs. The associations of the single nucleotide polymorphisms (SNPs), the internal pharmacokinetic patterns and the vulnerability of specific adverse reactions become one of the research interests of pharmacogenomics. The conventional genomewide association studies (GWAS) mainly focuses on the relation of single or multiple SNPs to a specific risk factors which are a one-to-many relation. However, there are no robust methods to establish a many-to-many network which can combine the direct and indirect associations between multiple SNPs and a serial of events (e.g. adverse reactions, metabolic patterns, prognostic factors etc.). In this paper, we present a novel deep learning model based on generative stochastic networks and hidden Markov chain to classify the observed samples with SNPs on five loci of two genes (CYP2D6 and CYP1A2) respectively to the vulnerable population of 14 types of adverse reactions.

Methods

A supervised deep learning model is proposed in this study. The revised generative stochastic networks (GSN) model with transited by the hidden Markov chain is used. The data of the training set are collected from clinical observation. The training set is composed of 83 observations of blood samples with the genotypes respectively on CYP2D6*2, *10, *14 and CYP1A2*1C, *1 F. The samples are genotyped by the polymerase chain reaction (PCR) method. A hidden Markov chain is used as the transition operator to simulate the probabilistic distribution. The model can perform learning at lower cost compared to the conventional maximal likelihood method because the transition distribution is conditional on the previous state of the hidden Markov chain. A least square loss (LASSO) algorithm and a k-Nearest Neighbors (kNN) algorithm are used as the baselines for comparison and to evaluate the performance of our proposed deep learning model.

Results

There are 53 adverse reactions reported during the observation. They are assigned to 14 categories. In the comparison of classification accuracy, the deep learning model shows superiority over the LASSO and kNN model with a rate over 80 %. In the comparison of reliability, the deep learning model shows the best stability among the three models.

Conclusions

Machine learning provides a new method to explore the complex associations among genomic variations and multiple events in pharmacogenomics studies. The new deep learning algorithm is capable of classifying various SNPs to the corresponding adverse reactions. We expect that as more genomic variations are added as features and more observations are made, the deep learning model can improve its performance and can act as a black-box but reliable verifier for other GWAS studies.
  相似文献   

8.
One of the central problems in computational neuroscience is to understand how the object-recognition pathway of the cortex learns a deep hierarchy of nonlinear feature detectors. Recent progress in machine learning shows that it is possible to learn deep hierarchies without requiring any labelled data. The feature detectors are learned one layer at a time and the goal of the learning procedure is to form a good generative model of images, not to predict the class of each image. The learning procedure only requires the pairwise correlations between the activations of neuron-like processing units in adjacent layers. The original version of the learning procedure is derived from a quadratic ‘energy’ function but it can be extended to allow third-order, multiplicative interactions in which neurons gate the pairwise interactions between other neurons. A technique for factoring the third-order interactions leads to a learning module that again has a simple learning rule based on pairwise correlations. This module looks remarkably like modules that have been proposed by both biologists trying to explain the responses of neurons and engineers trying to create systems that can recognize objects.  相似文献   

9.
Y. Li  B. Sixou  F. Peyrin 《IRBM》2021,42(2):120-133
Super resolution problems are widely discussed in medical imaging. Spatial resolution of medical images are not sufficient due to the constraints such as image acquisition time, low irradiation dose or hardware limits. To address these problems, different super resolution methods have been proposed, such as optimization or learning-based approaches. Recently, deep learning methods become a thriving technology and are developing at an exponential speed. We think it is necessary to write a review to present the current situation of deep learning in medical imaging super resolution. In this paper, we first briefly introduce deep learning methods, then present a number of important deep learning approaches to solve super resolution problems, different architectures as well as up-sampling operations will be introduced. Afterwards, we focus on the applications of deep learning methods in medical imaging super resolution problems, the challenges to overcome will be presented as well.  相似文献   

10.
Several theories propose that the cortex implements an internal model to explain, predict, and learn about sensory data, but the nature of this model is unclear. One condition that could be highly informative here is Charles Bonnet syndrome (CBS), where loss of vision leads to complex, vivid visual hallucinations of objects, people, and whole scenes. CBS could be taken as indication that there is a generative model in the brain, specifically one that can synthesise rich, consistent visual representations even in the absence of actual visual input. The processes that lead to CBS are poorly understood. Here, we argue that a model recently introduced in machine learning, the deep Boltzmann machine (DBM), could capture the relevant aspects of (hypothetical) generative processing in the cortex. The DBM carries both the semantics of a probabilistic generative model and of a neural network. The latter allows us to model a concrete neural mechanism that could underlie CBS, namely, homeostatic regulation of neuronal activity. We show that homeostatic plasticity could serve to make the learnt internal model robust against e.g. degradation of sensory input, but overcompensate in the case of CBS, leading to hallucinations. We demonstrate how a wide range of features of CBS can be explained in the model and suggest a potential role for the neuromodulator acetylcholine. This work constitutes the first concrete computational model of CBS and the first application of the DBM as a model in computational neuroscience. Our results lend further credence to the hypothesis of a generative model in the brain.  相似文献   

11.
While deep learning models have seen increasing applications in protein science, few have been implemented for protein backbone generation—an important task in structure-based problems such as active site and interface design. We present a new approach to building class-specific backbones, using a variational auto-encoder to directly generate the 3D coordinates of immunoglobulins. Our model is torsion- and distance-aware, learns a high-resolution embedding of the dataset, and generates novel, high-quality structures compatible with existing design tools. We show that the Ig-VAE can be used with Rosetta to create a computational model of a SARS-CoV2-RBD binder via latent space sampling. We further demonstrate that the model’s generative prior is a powerful tool for guiding computational protein design, motivating a new paradigm under which backbone design is solved as constrained optimization problem in the latent space of a generative model.  相似文献   

12.
In this mini review, we capture the latest progress of applying artificial intelligence (AI) techniques based on deep learning architectures to molecular de novo design with a focus on integration with experimental validation. We will cover the progress and experimental validation of novel generative algorithms, the validation of QSAR models and how AI-based molecular de novo design is starting to become connected with chemistry automation. While progress has been made in the last few years, it is still early days. The experimental validations conducted thus far should be considered proof-of-principle, providing confidence that the field is moving in the right direction.  相似文献   

13.
De novo drug design is the process of generating novel lead compounds with desirable pharmacological and physiochemical properties. The application of deep learning (DL) in de novo drug design has become a hot topic, and many DL-based approaches have been developed for molecular generation tasks. Generally, these approaches were developed as per four frameworks: recurrent neural networks; encoder-decoder; reinforcement learning; and generative adversarial networks. In this review, we first introduced the molecular representation and assessment metrics used in DL-based de novo drug design. Then, we summarized the features of each architecture. Finally, the potential challenges and future directions of DL-based molecular generation were prospected.  相似文献   

14.
Accurate retention time (RT) prediction is important for spectral library-based analysis in data-independent acquisition mass spectrometry-based proteomics. The deep learning approach has demonstrated superior performance over traditional machine learning methods for this purpose. The transformer architecture is a recent development in deep learning that delivers state-of-the-art performance in many fields such as natural language processing, computer vision, and biology. We assess the performance of the transformer architecture for RT prediction using datasets from five deep learning models Prosit, DeepDIA, AutoRT, DeepPhospho, and AlphaPeptDeep. The experimental results on holdout datasets and independent datasets exhibit state-of-the-art performance of the transformer architecture. The software and evaluation datasets are publicly available for future development in the field.  相似文献   

15.
Deep learning approaches have produced substantial breakthroughs in fields such as image classification and natural language processing and are making rapid inroads in the area of protein design. Many generative models of proteins have been developed that encompass all known protein sequences, model specific protein families, or extrapolate the dynamics of individual proteins. Those generative models can learn protein representations that are often more informative of protein structure and function than hand-engineered features. Furthermore, they can be used to quickly propose millions of novel proteins that resemble the native counterparts in terms of expression level, stability, or other attributes. The protein design process can further be guided by discriminative oracles to select candidates with the highest probability of having the desired properties. In this review, we discuss five classes of generative models that have been most successful at modeling proteins and provide a framework for model guided protein design.  相似文献   

16.
We review state-of-the-art computational methods for constructing, from image data, generative statistical models of cellular and nuclear shapes and the arrangement of subcellular structures and proteins within them. These automated approaches allow consistent analysis of images of cells for the purposes of learning the range of possible phenotypes, discriminating between them, and informing further investigation. Such models can also provide realistic geometry and initial protein locations to simulations in order to better understand cellular and subcellular processes. To determine the structures of cellular components and how proteins and other molecules are distributed among them, the generative modeling approach described here can be coupled with high throughput imaging technology to infer and represent subcellular organization from data with few a priori assumptions. We also discuss potential improvements to these methods and future directions for research.  相似文献   

17.
Single-particle cryo-electron microscopy (cryo-EM) is a technique that takes projection images of biomolecules frozen at cryogenic temperatures. A major advantage of this technique is its ability to image single biomolecules in heterogeneous conformations. While this poses a challenge for data analysis, recent algorithmic advances have enabled the recovery of heterogeneous conformations from the noisy imaging data. Here, we review methods for the reconstruction and heterogeneity analysis of cryo-EM images, ranging from linear-transformation-based methods to nonlinear deep generative models. We overview the dimensionality-reduction techniques used in heterogeneous 3D reconstruction methods and specify what information each method can infer from the data. Then, we review the methods that use cryo-EM images to estimate probability distributions over conformations in reduced subspaces or predefined by atomistic simulations. We conclude with the ongoing challenges for the cryo-EM community.  相似文献   

18.
Particle tracking in living systems requires low light exposure and short exposure times to avoid phototoxicity and photobleaching and to fully capture particle motion with high-speed imaging. Low-excitation light comes at the expense of tracking accuracy. Image restoration methods based on deep learning dramatically improve the signal-to-noise ratio in low-exposure data sets, qualitatively improving the images. However, it is not clear whether images generated by these methods yield accurate quantitative measurements such as diffusion parameters in (single) particle tracking experiments. Here, we evaluate the performance of two popular deep learning denoising software packages for particle tracking, using synthetic data sets and movies of diffusing chromatin as biological examples. With synthetic data, both supervised and unsupervised deep learning restored particle motions with high accuracy in two-dimensional data sets, whereas artifacts were introduced by the denoisers in three-dimensional data sets. Experimentally, we found that, while both supervised and unsupervised approaches improved tracking results compared with the original noisy images, supervised learning generally outperformed the unsupervised approach. We find that nicer-looking image sequences are not synonymous with more precise tracking results and highlight that deep learning algorithms can produce deceiving artifacts with extremely noisy images. Finally, we address the challenge of selecting parameters to train convolutional neural networks by implementing a frugal Bayesian optimizer that rapidly explores multidimensional parameter spaces, identifying networks yielding optimal particle tracking accuracy. Our study provides quantitative outcome measures of image restoration using deep learning. We anticipate broad application of this approach to critically evaluate artificial intelligence solutions for quantitative microscopy.  相似文献   

19.
Bolstered by recent methodological and hardware advances, deep learning has increasingly been applied to biological problems and structural proteomics. Such approaches have achieved remarkable improvements over traditional machine learning methods in tasks ranging from protein contact map prediction to protein folding, prediction of protein–protein interaction interfaces, and characterization of protein–drug binding pockets. In particular, emergence of ab initio protein structure prediction methods including AlphaFold2 has revolutionized protein structural modeling. From a protein function perspective, numerous deep learning methods have facilitated deconvolution of the exact amino acid residues and protein surface regions responsible for binding other proteins or small molecule drugs. In this review, we provide a comprehensive overview of recent deep learning methods applied in structural proteomics.  相似文献   

20.
Informative proteins are the proteins that play critical functional roles inside cells. They are the fundamental knowledge of translating bioinformatics into clinical practices. Many methods of identifying informative biomarkers have been developed which are heuristic and arbitrary, without considering the dynamics characteristics of biological processes. In this paper, we present a generative model of identifying the informative proteins by systematically analyzing the topological variety of dynamic protein-protein interaction networks (PPINs). In this model, the common representation of multiple PPINs is learned using a deep feature generation model, based on which the original PPINs are rebuilt and the reconstruction errors are analyzed to locate the informative proteins. Experiments were implemented on data of yeast cell cycles and different prostate cancer stages. We analyze the effectiveness of reconstruction by comparing different methods, and the ranking results of informative proteins were also compared with the results from the baseline methods. Our method is able to reveal the critical members in the dynamic progresses which can be further studied to testify the possibilities for biomarker research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号