首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 203 毫秒
1.
药物研发是非常重要但也十分耗费人力物力的过程。利用计算机辅助预测药物与蛋白质亲和力的方法可以极大地加快药物研发过程。药物靶标亲和力预测的关键在于对药物和蛋白质进行准确详细地信息表征。提出一种基于深度学习与多层次信息融合的药物靶标亲和力的预测模型,试图通过综合药物与蛋白质的多层次信息,来获得更好的预测表现。首先将药物表述成分子图和扩展连接指纹两种形式,分别利用图卷积神经网络模块和全连接层进行学习;其次将蛋白质序列和蛋白质K-mer特征分别输入卷积神经网络模块和全连接层来学习蛋白质潜在特征;随后将4个通道学习到的特征进行融合,再利用全连接层进行预测。在两个基准药物靶标亲和力数据集上验证了所提方法的有效性,并与其他已有模型作对比研究。结果说明提出的模型相比基准模型能得到更好的预测性能,表明提出的综合药物与蛋白质多层次信息的药物靶标亲和力预测策略是有效的。  相似文献   

2.
机器学习使实现数据的智能化处理及充分利用数据中蕴含的知识与价值成为可能。探索基于机器学习在风景园林领域智能化分析应用的途径,开展3个实验。其中2个与数据分析研究相关,提出基于调研图像色彩聚类分析的城市色彩印象和基于图像识别技术的景观视觉质量评估与网络应用平台部署实验。最后1个实验与数字化设计创作相关,提出用于设计方案遴选的地形生成方法,包括2个子项目:应用深度学习生成对抗网络(GAN)的地形生成和建立遮罩、预测未知区域的高程。3个实验应用到机器学习中分类、聚类和回归3个主要方向中的算法以及深度学习的生成对抗网络,对传统的研究问题提出了基于机器学习新的研究方法。因此,在应用机器学习风景园林领域,可以有效地从多源数据中学习相互增强的知识,发现问题,并提出解决问题的新方法。  相似文献   

3.
环境微生物研究中机器学习算法及应用   总被引:1,自引:0,他引:1  
陈鹤  陶晔  毛振镀  邢鹏 《微生物学报》2022,62(12):4646-4662
微生物在环境中无处不在,它们不仅是生物地球化学循环和环境演化的关键参与者,也在环境监测、生态治理和保护中发挥着重要作用。随着高通量技术的发展,大量微生物数据产生,运用机器学习对环境微生物大数据进行建模和分析,在微生物标志物识别、污染物预测和环境质量预测等领域的科学研究和社会应用方面均具有重要意义。机器学习可分为监督学习和无监督学习2大类。在微生物组学研究当中,无监督学习通过聚类、降维等方法高效地学习输入数据的特征,进而对微生物数据进行整合和归类。监督学习运用有特征和标记的微生物数据集训练模型,在面对只有特征没有标记的数据时可以判断出标记,从而实现对新数据的分类、识别和预测。然而,复杂的机器学习算法通常以牺牲可解释性为代价来重点关注模型预测的准确性。机器学习模型通常可以看作预测特定结果的“黑匣子”,即对模型如何得出预测所知甚少。为了将机器学习更多地运用于微生物组学研究、提高我们提取有价值的微生物信息的能力,深入了解机器学习算法、提高模型的可解释性尤为重要。本文主要介绍在环境微生物领域常用的机器学习算法和基于微生物组数据的机器学习模型的构建步骤,包括特征选择、算法选择、模型构建和评估等,并对各种机器学习模型在环境微生物领域的应用进行综述,深入探究微生物组与周围环境之间的关联,探讨提高模型可解释性的方法,并为未来环境监测、环境健康预测提供科学参考。  相似文献   

4.
为提高对虾外部生长性状表型数据的获取效率,利用拍照获得的对虾外部表型照片,采用基于区域生成网络RPN(Region Proposal Networks)的Faster R-CNN(Faster Region-convolutional neural networks)深度学习神经网络,通过对8400张对虾表型照片的学习和训练,构建了快速识别对虾全长并输出位置信息的模型。该模型可识别图片中的对虾并以识别框的形式表示出具体的位置。对于不同角度拍摄的对虾,模型生成识别框的长度或对角线长度与人工测量的对虾全长之间呈高度相关。研究以此建立了对虾全长性状表型数据高通量测定技术,该技术的建立可以在对虾生长性状表型数据测定中节省人工测量的时间,提高了对虾全基因组选择育种的效率。此外,该模型的建立也为对虾头胸部长度及不同体节长度等其他外部表型数据的测定提供了新的思路,为对虾生长性状表型组的建立奠定了重要基础。  相似文献   

5.
近年来,随着计算机硬件、软件工具和数据丰度的不断突破,以机器学习为代表的人工智能技术在生物、基础医学和药学等领域的应用不断拓展和融合,极大地推动了这些领域的发展,尤其是药物研发领域的变革。其中,药物-靶标相互作用(drug-target interactions, DTI)的识别是药物研发领域中的重要难题和人工智能技术交叉融合的热门方向,研究人员在DTI预测方面做了大量的工作,构建了许多重要的数据库,开发或拓展了各类机器学习算法和工具软件。对基于机器学习的DTI预测的基本流程进行了介绍,并对利用机器学习预测DTI的研究进行了回顾,同时对不同的机器学习方法运用于DTI预测的优缺点进行了简单总结,以期对开发更加有效的预测算法和DTI预测的发展提供帮助。  相似文献   

6.
夏彬彬  王军 《生物工程学报》2021,37(11):3863-3879
随着蛋白质序列及结构数据的大量累积,在获得了大量描述性信息之后如何有效利用海量数据,从已有数据中高效提取信息并且应用到下游任务当中就成为了研究者亟待解决的问题。蛋白质的设计可使新蛋白的研发不再受限于实验条件,这对药物靶点预测、新药研发和材料设计等领域具有重要意义。深度学习作为一种高效的数据特征提取方法,可以通过它对蛋白质数据进行建模,进而加入先验信息对蛋白质进行设计。故此基于深度学习的蛋白质设计就成为一个具有广阔前景的研究领域。文中主要阐述基于深度学习的蛋白质序列与结构数据的建模和设计方法。详述该方法的策略、原理、适用范围、应用实例。讨论了深度学习方法在本领域的应用前景及局限性,以期为相关研究提供参考。  相似文献   

7.
目的 长链非编码RNA在遗传、代谢和基因表达调控等方面发挥着重要作用。然而,传统的实验方法解析RNA的三级结构耗时长、费用高且操作要求高。此外,通过计算方法来预测RNA的三级结构在近十年来无突破性进展。因此,需要提出新的预测算法来准确的预测RNA的三级结构。所以,本文发展可以用于提高RNA三级结构预测准确性的碱基关联图预测方法。方法 为了利用RNA理化特征信息,本文应用多层全卷积神经网络和循环神经网络的深度学习算法来预测RNA碱基间的接触概率,并通过注意力机制处理RNA序列中碱基间相互依赖的特征。结果 通过多层神经网络与注意力机制结合,本文方法能够有效得到RNA特征值中局部和全局的信息,提高了模型的鲁棒性和泛化能力。检验计算表明,所提出模型对序列长度L的4种标准(L/10、L/5、L/2、L)碱基关联图的预测准确率分别达到0.84、0.82、0.82和0.75。结论 基于注意力机制的深度学习预测算法能够提高RNA碱基关联图预测的准确率,从而帮助RNA三级结构的预测。  相似文献   

8.
目的 抗癌药物联合疗法是一种很有前途的治疗策略。针对特定癌症类型,选择高度协同的药物组合,对提高癌症疗效至关重要。然而,确定具有协同作用的药物组合是一项复杂而困难的工作。本研究旨在完全以数据驱动、计算建模的方式优化抗癌药物组合高通量虚拟筛选,为“旧药重新定位新组合”提供理论参考。方法 借鉴矩阵填充思想,构建了基于核范数正则化的计算模型NNRM,用于预测抗癌药物组合的协同得分和协同状态。针对固定细胞系构造对称的协同得分观测矩阵;采用分折技巧将观测矩阵稀疏化;借助“交替方向乘子法”和“软阈值估计”求解模型。结果 将NNRM应用于O’Neil团队发布的数据集,预测的协同得分与观测值之间的均方根误差为14.78,预测的协同状态准确率为0.94,优于随机森林(RF)和支持向量机(SVM),完全可以与深度学习模型相媲美。此外,NNRM预测的部分缺失值结果与已有研究或临床实践相吻合。结论 NNRM可实现大规模、批量预测抗癌药物组合的协同作用,极大地降低了已有模型对数据的要求和计算成本,缩短了高通量虚拟筛选的测试时间,可以作为抗癌药物组合高通量虚拟筛选的可选择工具。  相似文献   

9.
在介观尺度上,小鼠大脑图像的数据量可达到10 TB量级,人脑数据量则达到惊人的几十PB,从海量脑图像数据中识别和分析神经元的形态是一项复杂且具有挑战的任务。当前研究人员提出了基于传统机器学习和深度学习的神经元识别算法,其中传统机器学习方法存在迁移、泛化能力较差的问题,基于深度学习的算法虽然可以通过海量精确标注的训练数据提高模型的泛化性,但缺乏精确且丰富的图像标记数据集,因此同样存在过拟合和泛化能力弱等问题。本文提出了一种基于深度学习的弱监督神经元识别方案,仅需要少量有标注的数据,即可通过迭代策略获取海量神经元图像的精确识别结果,具备较强的泛化能力,并最大限度减少人工参与量。该方法在fMOST、BigNeuron等数据集上进行了实验,自动识别精度F1值分别为0.9247和0.8318,优于其他对比的神经元识别算法。  相似文献   

10.
随着多特征决策研究的深入,传统方法已经不能回答更加细致的问题。细察精确预测的理论、建立模型与数据的形式关系成为更有希望的研究方向。神经网络模型设计用来模拟许多并行的认知和神经行为,具有样例学习和迁移适应能力.在解释和预测方面具有传统方法所不具备的潜力。神经网络能够同时表征线性补偿和非补偿规则,其应用已经渗透到许多学科领域。网络范式对于人事研究和应用也有价值,有研究表明神经网络可以用于人力资源管理的一般领域,成为人事决策研究的新范式。  相似文献   

11.
Abstract

Accurate and rapid toxic gas concentration prediction model plays an important role in emergency aid of sudden gas leak. However, it is difficult for existing dispersion model to achieve accuracy and efficiency requirements at the same time. Although some researchers have considered developing new forecasting models with traditional machine learning, such as back propagation (BP) neural network, support vector machine (SVM), the prediction results obtained from such models need to be improved still in terms of accuracy. Then new prediction models based on deep learning are proposed in this paper. Deep learning has obvious advantages over traditional machine learning in prediction and classification. Deep belief networks (DBNs) as well as convolution neural networks (CNNs) are used to build new dispersion models here. Both models are compared with Gaussian plume model, computation fluid dynamics (CFD) model and models based on traditional machine learning in terms of accuracy, prediction time, and computation time. The experimental results turn out that CNNs model performs better considering all evaluation indexes.  相似文献   

12.
To predict rice blast, many machine learning methods have been proposed. As the quality and quantity of input data are essential for machine learning techniques, this study develops three artificial neural network (ANN)-based rice blast prediction models by combining two ANN models, the feed-forward neural network (FFNN) and long short-term memory (LSTM), with diverse input datasets, and compares their performance. The Blast_Weather_FFNN model had the highest recall score (66.3%) for rice blast prediction. This model requires two types of input data: blast occurrence data for the last 3 years and weather data (daily maximum temperature, relative humidity, and precipitation) between January and July of the prediction year. This study showed that the performance of an ANN-based disease prediction model was improved by applying suitable machine learning techniques together with the optimization of hyperparameter tuning involving input data. Moreover, we highlight the importance of the systematic collection of long-term disease data.  相似文献   

13.
With the development of artificial intelligence (AI) technologies and the availability of large amounts of biological data, computational methods for proteomics have undergone a developmental process from traditional machine learning to deep learning. This review focuses on computational approaches and tools for the prediction of protein – DNA/RNA interactions using machine intelligence techniques. We provide an overview of the development progress of computational methods and summarize the advantages and shortcomings of these methods. We further compiled applications in tasks related to the protein – DNA/RNA interactions, and pointed out possible future application trends. Moreover, biological sequence-digitizing representation strategies used in different types of computational methods are also summarized and discussed.  相似文献   

14.
Antimicrobial resistance is a growing health concern. Antimicrobial peptides (AMPs) disrupt harmful microorganisms by nonspecific mechanisms, making it difficult for microbes to develop resistance. Accordingly, they are promising alternatives to traditional antimicrobial drugs. In this study, we developed an improved AMP classification model, called AMP-BERT. We propose a deep learning model with a fine-tuned bidirectional encoder representations from transformers (BERT) architecture designed to extract structural/functional information from input peptides and identify each input as AMP or non-AMP. We compared the performance of our proposed model and other machine/deep learning-based methods. Our model, AMP-BERT, yielded the best prediction results among all models evaluated with our curated external dataset. In addition, we utilized the attention mechanism in BERT to implement an interpretable feature analysis and determine the specific residues in known AMPs that contribute to peptide structure and antimicrobial function. The results show that AMP-BERT can capture the structural properties of peptides for model learning, enabling the prediction of AMPs or non-AMPs from input sequences. AMP-BERT is expected to contribute to the identification of candidate AMPs for functional validation and drug development. The code and dataset for the fine-tuning of AMP-BERT is publicly available at https://github.com/GIST-CSBL/AMP-BERT .  相似文献   

15.
This review presents a modern perspective on dynamical systems in the context of current goals and open challenges. In particular, our review focuses on the key challenges of discovering dynamics from data and finding data-driven representations that make nonlinear systems amenable to linear analysis. We explore various challenges in modern dynamical systems, along with emerging techniques in data science and machine learning to tackle them. The two chief challenges are (1) nonlinear dynamics and (2) unknown or partially known dynamics. Machine learning is providing new and powerful techniques for both challenges. Dimensionality reduction methods are used for projecting dynamical methods in reduced form, and these methods perform computational efficiency on real-world data. Data-driven models drive to discover the governing equations and give laws of physics. The identification of dynamical systems through deep learning techniques succeeds in inferring physical systems. Machine learning provides advanced new and powerful algorithms for nonlinear dynamics. Advanced deep learning methods like autoencoders, recurrent neural networks, convolutional neural networks, and reinforcement learning are used in modeling of dynamical systems.  相似文献   

16.
Despite growing concerns over the health of global invertebrate diversity, terrestrial invertebrate monitoring efforts remain poorly geographically distributed. Machine-assisted classification has been proposed as a potential solution to quickly gather large amounts of data; however, previous studies have often used unrealistic or idealized datasets to train and test their models.In this study, we describe a practical methodology for including machine learning in ecological data acquisition pipelines. Here we train and test machine learning algorithms to classify over 72,000 terrestrial invertebrate specimens from morphometric data and contextual metadata. All vouchered specimens were collected in pitfall traps by the National Ecological Observatory Network (NEON) at 45 locations across the United States from 2016 to 2019. Specimens were photographed, and two separate machine learning paradigms were used to classify them. In the first, we used a convolutional neural network (ResNet-50), and in the second, we extracted morphometric data as feature vectors using ImageJ and used traditional machine learning methods to classify specimens. Issues stemming from inconsistent taxonomic label specificity were resolved by making classifications at the lowest identified taxonomic level (LITL). Taxa with too few specimens to be included in the training dataset were classified by the model using zero-shot classification.When classifying specimens that were known and seen by our models, we reached a maximum accuracy of 72.7% using eXtreme Gradient Boosting (XGBoost) at the LITL. This nearly matched the maximum accuracy achieved by the CNN of 72.8% at the LITL. Models that were trained without contextual metadata underperformed models with contextual metadata. We also classified invertebrate taxa that were unknown to the model using zero-shot classification, reaching a maximum accuracy of 65.5% when using the ResNet-50, compared to 39.4% when using XGBoost.The general methodology outlined here represents a realistic application of machine learning as a tool for ecological studies. We found that more advanced and complex machine learning methods such as convolutional neural networks are not necessarily more accurate than traditional machine learning methods. Hierarchical and LITL classifications allow for flexible taxonomic specificity at the input and output layers. These methods also help address the ‘long tail’ problem of underrepresented taxa missed by machine learning models. Finally, we encourage researchers to consider more than just morphometric data when training their models, as we have shown that the inclusion of contextual metadata can provide significant improvements to accuracy.  相似文献   

17.
Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning.  相似文献   

18.
Classification methods used in machine learning (e.g., artificial neural networks, decision trees, and k-nearest neighbor clustering) are rarely used with population genetic data. We compare different nonparametric machine learning techniques with parametric likelihood estimations commonly employed in population genetics for purposes of assigning individuals to their population of origin ("assignment tests"). Classifier accuracy was compared across simulated data sets representing different levels of population differentiation (low and high F(ST)), number of loci surveyed (5 and 10), and allelic diversity (average of three or eight alleles per locus). Empirical data for the lake trout (Salvelinus namaycush) exhibiting levels of population differentiation comparable to those used in simulations were examined to further evaluate and compare classification methods. Classification error rates associated with artificial neural networks and likelihood estimators were lower for simulated data sets compared to k-nearest neighbor and decision tree classifiers over the entire range of parameters considered. Artificial neural networks only marginally outperformed the likelihood method for simulated data (0-2.8% lower error rates). The relative performance of each machine learning classifier improved relative likelihood estimators for empirical data sets, suggesting an ability to "learn" and utilize properties of empirical genotypic arrays intrinsic to each population. Likelihood-based estimation methods provide a more accessible option for reliable assignment of individuals to the population of origin due to the intricacies in development and evaluation of artificial neural networks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号