首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 46 毫秒
1.
谢兵  苏波  刘宁 《生物信息学》2025,23(2):111-121
近年来,针对癌症的大量研究,积累了海量的多组学数据,这些数据为高效鉴定癌症驱动基因提供了可能;本研究提出了一种基于深度图信息最大化的异亲图卷积网络(Heterophilic deep graph information maximization convolutional network, HDGICN)模型用于识别癌症驱动基因,HDGICN首先整合了图信息最大化和个性化PageRank算法对异亲生物分子网络的基因节点进行特征增强,然后通过整合了双重残差结构的分层混合图卷积来学习异亲生物分子网络上的基因特征,最后根据预测得分来识别癌症驱动基因;实验结果显示,在三个异亲生物分子网络上,HDGICN的受试者工作特性曲线下面积(Area under receiver operating characteristic, AUROC)和精确率-召回率曲线下面积(Area under the precision-recall curve, AUPRC)均优于其他传统方法,消融实验结果进一步表明本方法有助于提升预测性能。HDGICN方法在异亲生物分子网络上能够有效识别出癌症驱动基因,可以为实现癌症的精准治疗和生物标志物的发现提供重要帮助。  相似文献   

2.
生物小分子microRNA可以对基因表达进行正向或负向调控,研究microRNA与基因之间的关系对于机体稳态的维持和疾病治疗都有着重要意义。利用深度学习方法对microRNA和基因靶向关系进行预测,提出了TransformerMGI模型。在特征工程阶段,针对生物序列潜在信息难以准确地提取这一问题,TransformerMGI模型分别采用了基于图卷积神经网络的GP-GCN方法和DNA2Vec模型对microRNA和基因数据的潜在信息进行提取,得到了二者的表征嵌入矩阵,在模型方面,TransformerMGI模型引入了幂归一化来改进经典的深度学习模型。利用microRNA和基因数据经过特征提取后得到两个表征矩阵,这两个矩阵分别被放入TransformerMGI模型中,通过TransformerMGI模型内部的Attention机制对二者自身和相互的特征信息进行了聚合和关联运算,最终预测出microRNA调控基因的概率。采用ROC曲线下面积和准确召回率曲线作为模型性能评价指标,将TransformerMGI与其他现有模型进行了比较评估。实验结果表明,TransformerMGI模型的AUC和AUPRC评分均可达0.91以上,优于现有的其他模型。TransformerMGI模型能在不考虑生物学原理和基因组背景的前提下,仅依赖microRNA和基因的碱基序列信息,实现microRNA靶向基因的预测,从而为后续的microRNA靶向基因预测研究提供了可借鉴的深度学习方法。  相似文献   

3.
药物从研发到临床应用需要耗费较长的时间,研发期间的投入成本可高达十几亿元。而随着医药研发与人工智能的结合以及生物信息学的飞速发展,药物活性相关数据急剧增加,传统的实验手段进行药物活性预测已经难以满足药物研发的需求。借助算法来辅助药物研发,解决药物研发中的各种问题能够大大推动药物研发进程。传统机器学习方法尤其是随机森林、支持向量机和人工神经网络在药物活性方面能够达到较高的预测精度。深度学习由于具有多层神经网络,模型可以接收高维的输入变量且不需要人工限定数据输入特征,可以拟合较为复杂的函数模型,应用于药物研发可以进一步提高各个环节的效率。在药物活性预测中应用较为广泛的深度学习模型主要是深度神经网络(deep neural networks,DNN)、循环神经网络(recurrent neural networks,RNN)和自编码器(auto encoder,AE),而生成对抗网络(generative adversarial networks,GAN)由于其生成数据的能力常常被用来和其他模型结合进行数据增强。近年来深度学习在药物分子活性预测方面的研究和应用综述表明,深度学习模型的准确度和效率均高于传统实验方法和传统机器学习方法。因此,深度学习模型有望成为药物研发领域未来十年最重要的辅助计算模型。  相似文献   

4.
目的 药物研发成本高、周期长且成功率低。准确预测分子属性对有效筛选药物候选物、优化分子结构具有重要意义。基于特征工程的传统分子属性预测方法需研究人员具备深厚的学科背景和广泛的专业知识。随着人工智能技术的不断成熟,涌现出大量优于传统特征工程方法的分子属性预测算法。然而这些算法模型仍然存在标记数据稀缺、泛化性能差等问题。鉴于此,本文提出一种基于Bert+GCN的多模态数据融合的分子属性预测算法(命名为BGMF),旨在整合药物分子的多模态数据,并充分利用大量无标记药物分子训练模型学习药物分子的有用信息。方法 本文提出了BGMF算法,该算法根据药物SMILES表达式分别提取了原子序列、分子指纹序列和分子图数据,采用预训练模型Bert和图卷积神经网络GCN结合的方式进行特征学习,在挖掘药物分子中“单词”全局特征的同时,融合了分子图的局部拓扑特征,从而更充分利用分子全局-局部上下文语义关系,之后,通过对原子序列和分子指纹序列的双解码器设计加强分子特征表达。结果 5个数据集共43个分子属性预测任务上,BGMF方法的AUC值均优于现有其他方法。此外,本文还构建独立测试数据集验证了模型具有良好的泛化性能。对生成的分子指纹表征(molecular fingerprint representation)进行t-SNE可视化分析,证明了BGMF模型可成功捕获不同分子指纹的内在结构与特征。结论 通过图卷积神经网络与Bert模型相结合,BGMF将分子图数据整合到分子指纹恢复和掩蔽原子恢复的任务中,可以有效地捕捉分子指纹的内在结构和特征,进而高效预测药物分子属性。  相似文献   

5.
药物研发是非常重要但也十分耗费人力物力的过程。利用计算机辅助预测药物与蛋白质亲和力的方法可以极大地加快药物研发过程。药物靶标亲和力预测的关键在于对药物和蛋白质进行准确详细地信息表征。提出一种基于深度学习与多层次信息融合的药物靶标亲和力的预测模型,试图通过综合药物与蛋白质的多层次信息,来获得更好的预测表现。首先将药物表述成分子图和扩展连接指纹两种形式,分别利用图卷积神经网络模块和全连接层进行学习;其次将蛋白质序列和蛋白质K-mer特征分别输入卷积神经网络模块和全连接层来学习蛋白质潜在特征;随后将4个通道学习到的特征进行融合,再利用全连接层进行预测。在两个基准药物靶标亲和力数据集上验证了所提方法的有效性,并与其他已有模型作对比研究。结果说明提出的模型相比基准模型能得到更好的预测性能,表明提出的综合药物与蛋白质多层次信息的药物靶标亲和力预测策略是有效的。  相似文献   

6.
目的 长链非编码RNA在遗传、代谢和基因表达调控等方面发挥着重要作用。然而,传统的实验方法解析RNA的三级结构耗时长、费用高且操作要求高。此外,通过计算方法来预测RNA的三级结构在近十年来无突破性进展。因此,需要提出新的预测算法来准确的预测RNA的三级结构。所以,本文发展可以用于提高RNA三级结构预测准确性的碱基关联图预测方法。方法 为了利用RNA理化特征信息,本文应用多层全卷积神经网络和循环神经网络的深度学习算法来预测RNA碱基间的接触概率,并通过注意力机制处理RNA序列中碱基间相互依赖的特征。结果 通过多层神经网络与注意力机制结合,本文方法能够有效得到RNA特征值中局部和全局的信息,提高了模型的鲁棒性和泛化能力。检验计算表明,所提出模型对序列长度L的4种标准(L/10、L/5、L/2、L)碱基关联图的预测准确率分别达到0.84、0.82、0.82和0.75。结论 基于注意力机制的深度学习预测算法能够提高RNA碱基关联图预测的准确率,从而帮助RNA三级结构的预测。  相似文献   

7.
赵媛媛  王耘 《生物信息学》2016,14(4):235-242
人体作为一个复杂的功能系统。疾病的发生和发展,尤其是复杂疾病,其病理过程往往涉及多环节、多系统。单一药物难以满足复杂疾病的治疗要求,组合药物成为未来药物发展的新趋势。本文在构建组合药物网络的基础上进行MCODE算法聚类,得到33个独立且内部联系紧密的药物模块。其中26组药物模块用于治疗单一复杂疾病。通过详细分析癌症、疼痛、银屑病、细菌感染、类风湿性关节炎、化疗呕吐这六种复杂疾病,归纳总结出这六种疾病的药物组合模式,从而提出复杂疾病多角度的治疗策略。  相似文献   

8.
N6,2′-O-二甲基腺苷(m6Am)是一种常见的RNA分子的可逆修饰。部分研究已经说明m6Am对mRNA的影响,但现阶段对m6Am的生物学功能探索仍不够。所以我们提出了m6AmTwins,一种新的端到端双胞胎网络,将Transformer(自动编码器)和双向门控循环单元(Bi-GRU)有机结合,简单利用RNA序列得到RNA的检测性。相比于现有的算法,本文亮点在于利用对比学习,构建新的损失函数来训练m6AmTwins模型,提高了模型的泛化能力。基于Twins网络和简单编码方案,在两组正负比为1∶10的非平衡数据集下,其独立测试集上均取得了较好的结果,马修斯相关系数(MCC)分别得到0.53和0.545。同时,为增强m6AmTwins模型的鲁棒性(robustness),本文在训练集上还进行了10折交叉验证,其MCC结果分别为0...  相似文献   

9.
项和雨  邹斌  唐亮  陈维国  饶凯锋  刘勇  马梅  杨艳 《生态学报》2021,41(17):6883-6892
浮游植物作为水生态系统中最重要的生物组成部分之一,对水环境敏感,在水环境监测中得到了广泛的关注。然而水生环境复杂多样,准确高效地识别浮游植物是监测工作中的一大挑战。当前浮游植物识别方法可分为经典形态学分类、分子标记和人工智能图像识别三类。前两种方法已被广泛采用,但费时费力,不利于监测机构的大规模应用和推广。同样,利用图像进行自动化分类难以在高准确率与高效率上达到平衡。深度学习技术的发展为此提供了新思路。本文提出一种新的深度卷积神经网络RAN-11。该网络以残差注意力网络Attention-56和Attention-92为基础,凭借通道对齐融合主干上的底层特征与顶层特征,通过调整注意力模块和残差快个数以精简结构,并引入了Leaky ReLU激活函数代替ReLU。以太湖11个优势属共计1036张图像为数据来源进行对比验证。除星杆藻外,RAN-11对单一优势属的的查准率都在90%以上,并且有5个优势属达到100%的查准率。RAN-11的识别准确率为95.67%,推理速率为41.5帧/s,不仅比Attention-92(95.19%的准确率,23.6帧/s)更准确,而且比Attention-56(94.71%的准确率,41.2帧/s)更快,真正兼顾了准确率与效率。研究结果表明:(1)RAN-11在查准率、准确率和推理速率上优于原始残差注意力网络,更优于以词包模型为代表的传统图像识别方法;(2)融合多尺度特征、精简网络结构和优化激活函数是提高卷积神经网络性能的有力手段。建立在经典分类基础之上,本文提出新的残差注意力网络来提升浮游植物鉴定技术,并构建出浮游植物自动化识别系统,识别准确率高、易于推广,对于实现水体中浮游植物的自动化监测具有重要意义。  相似文献   

10.
目的 长非编码RNA(lncRNAs)参与多种重要的生物学过程并与各种人类疾病密切相关,因此,lncRNA-疾病关联预测研究有助于疾病的诊断、治疗和在分子水平理解人类疾病的发生发展机制。目前,大多数lncRNA-疾病关联预测方法倾向于浅层整合lncRNA和疾病的相关信息,忽略网络拓扑结构中的深层嵌入特征;另外通过随机选取lncRNA-疾病非关联对构建负样本训练集合,影响预测方法的鲁棒性。方法 本文提出一种基于网络嵌入的NELDA方法,预测潜在的lncRNA-疾病关联关系。NELDA首先利用lncRNA 表达谱、疾病本体论和已知的lncRNA-疾病关联关系,构建lncRNA相似性网络、疾病相似性网络和lncRNA-疾病关联网络。然后,通过设计4个深度自编码器分别从lncRNA/疾病的相似性网络、lncRNA-疾病关联网络学习lncRNA和疾病的低维网络嵌入特征。串联lncRNA和疾病的相似性网络嵌入特征及lncRNA和疾病的关联网络嵌入特征,分别输入两个支持向量机分类器预测lncRNA-疾病关联。最后,采用加权融合策略融合两个支持向量机分类器的预测结果,给出lncRNA-疾病关联关系的最终预测结果。另外,根据已知的lncRNA-疾病关联对和疾病语义相似性,设计一种负样本选取策略构建可信度相对较高的lncRNA-疾病非关联对样本集,用以改善分类器的鲁棒性,该策略通过设计一种打分函数为每对lncRNA-疾病进行打分,选取得分较低的lncRNA-疾病对作为lncRNA-疾病非关联对样本(即负样本)。结果 十折交叉验证实验结果表明:NELDA能够有效预测lncRNA-疾病关联关系,其AUC达到0.982 7,比现有LDASR和 LDNFSGB方法分别提高了0.062 7和0.020 7。另外,负样本选取策略与决策级加权融合策略能够有效改善NELDA预测性能。胃癌和乳腺癌案例研究中,29/40(72.5%)预测的与胃癌和乳腺癌关联lncRNAs,在近期文献和公共数据库中能够发现相关的支撑证据。结论 这些实验结果表明,NELDA是一种有效的lncRNA-疾病关联关系预测方法,具有挖掘潜在lncRNA-疾病关联关系的能力。  相似文献   

11.
化合物-蛋白质互作的鉴定对药物发现、靶标鉴定,网络药理学和蛋白质功能的阐明等至关重要。本文开发了一种基于表示学习的图神经网络预测化合物-蛋白质互作模型。首先利用Word2vec表示学习方法自动提取化合物和蛋白质的特征;然后将特征输入构建图神经网络预测模型,并与传统机器学习方法和前人的先进方法对比。结果显示模型在曲线下面积,准确率等评价指标上表现出更好的结果。预测Binding-DB数据库中所有未知的化合物-蛋白质互作对的概率,其中预测得分排名前五的化合物-蛋白质互作对中有四个得到了外部证据的验证,进一步证明了模型的鲁棒性和有效性。本模型可以充分利用聚合邻居信息,节点特征和自适应地捕获化合物-蛋白质空间的拓扑结构,从而实现较高的模型精度。本研究成果为化合物和蛋白质互作鉴定的研究提供了新的思路和方法。  相似文献   

12.
    
《IRBM》2022,43(5):333-339
1) ObjectivesPreterm birth caused by preterm labor is one of the major health problems in the world. In this article, we present a new framework for dealing with this problem through the processing of electrohysterographic signals (EHG) that are recorded during labor and pregnancy. The objective in this research is to improve the classification between labor and pregnancy contractions by using a new approach that focuses on the connectivity analysis based on graph parameters, representative of uterine synchronization, and comparing neural network and machine learning methods in order to classify between labor and pregnancy.2) Material and methodsafter denoising of the 16 EHG signals recorded from pregnant women abdomen, we applied different connectivity methods to obtain connectivity matrices; then by using the graph theory, we extracted some graph parameters from the connectivity matrices; finally, we tested different neural network and machine learning methods on the features obtained from both graph and connectivity methods in order to classify between labor and pregnancy.3) ResultsThe best results were obtained by using the logistic regression method. We also evidence the power of graph parameters extracted from the connectivity matrices to improve the classification results.4) ConclusionThe use of graph analysis associated with machine learning methods can be a powerful tool to improve labor and pregnancy classification based on the analysis of EHG signals.  相似文献   

13.
    
In many ways, graphs are the main modality of data we receive from nature. This is due to the fact that most of the patterns we see, both in natural and artificial systems, are elegantly representable using the language of graph structures. Prominent examples include molecules (represented as graphs of atoms and bonds), social networks and transportation networks. This potential has already been seen by key scientific and industrial groups, with already-impacted application areas including traffic forecasting, drug discovery, social network analysis and recommender systems. Further, some of the most successful domains of application for machine learning in previous years—images, text and speech processing—can be seen as special cases of graph representation learning, and consequently there has been significant exchange of information between these areas. The main aim of this short survey is to enable the reader to assimilate the key concepts in the area, and position graph representation learning in a proper context with related fields.  相似文献   

14.
    
The important role of non coding RNAs (ncRNAs) in the cell has made their identification a critical issue in the biological research. However, traditional approaches such as PT-PCR and Northern Blot are costly. With recent progress in bioinformatics and computational prediction technology, the discovery of ncRNAs has become realistically possible. This paper aims to introduce major computational approaches in the identification of ncRNAs, including homologous search, de novo prediction and mining in deep sequencing data. Furthermore, related software tools have been compared and reviewed along with a discussion on future improvements.  相似文献   

15.
    
Accurate identification of compound–protein interactions(CPIs) in silico may deepen our understanding of the underlying mechanisms of drug action and thus remarkably facilitate drug discovery and development.Conventional similarity-or docking-based computational methods for predicting CPIs rarely exploit latent features from currently available large-scale unlabeled compound and protein data and often limit their usage to relatively small-scale datasets.In the present study,we propose Deep CPI,a novel general and scalable computational framework that combines effective feature embedding(a technique of representation learning) with powerful deep learning methods to accurately predict CPIs at a large scale.Deep CPI automatically learns the implicit yet expressive low-dimensional features of compounds and proteins from a massive amount of unlabeled data.Evaluations of the measured CPIs in large-scale databases,such as Ch EMBL and Binding DB,as well as of the known drug–target interactions from Drug Bank,demonstrated the superior predictive performance of Deep CPI.Furthermore,several interactions among smallmolecule compounds and three G protein-coupled receptor targets(glucagon-like peptide-1 receptor,glucagon receptor,and vasoactive intestinal peptide receptor) predicted using Deep CPI were experimentally validated.The present study suggests that Deep CPI is a useful and powerful tool for drug discovery and repositioning.The source code of Deep CPI can be downloaded from https://github.com/Fangping Wan/Deep CPI.  相似文献   

16.
    
Electroencephalogram (EEG) signals acquired from brain can provide an effective representation of the human’s physiological and pathological states. Up to now, much work has been conducted to study and analyze the EEG signals, aiming at spying the current states or the evolution characteristics of the complex brain system. Considering the complex interactions between different structural and functional brain regions, brain network has received a lot of attention and has made great progress in brain mechanism research. In addition, characterized by autonomous, multi-layer and diversified feature extraction, deep learning has provided an effective and feasible solution for solving complex classification problems in many fields, including brain state research. Both of them show strong ability in EEG signal analysis, but the combination of these two theories to solve the difficult classification problems based on EEG signals is still in its infancy. We here review the application of these two theories in EEG signal research, mainly involving brain–computer interface, neurological disorders and cognitive analysis. Furthermore, we also develop a framework combining recurrence plots and convolutional neural network to achieve fatigue driving recognition. The results demonstrate that complex networks and deep learning can effectively implement functional complementarity for better feature extraction and classification, especially in EEG signal analysis.  相似文献   

17.
    
《遗传学报》2021,48(7):540-551
The response rate of most anti-cancer drugs is limited because of the high heterogeneity of cancer and the complex mechanism of drug action. Personalized treatment that stratifies patients into subgroups using molecular biomarkers is promising to improve clinical benefit. With the accumulation of preclinical models and advances in computational approaches of drug response prediction, pharmacogenomics has made great success over the last 20 years and is increasingly used in the clinical practice of personalized cancer medicine. In this article, we first summarize FDA-approved pharmacogenomic biomarkers and large-scale pharmacogenomic studies of preclinical cancer models such as patient-derived cell lines, organoids, and xenografts. Furthermore, we comprehensively review the recent developments of computational methods in drug response prediction, covering network, machine learning, and deep learning technologies and strategies to evaluate immunotherapy response. In the end, we discuss challenges and propose possible solutions for further improvement.  相似文献   

18.
In order to predict the risks of Alzheimer’s Disease (AD) based on the deep learning model of brain 18F-FDG positron emission tomography (PET), a total of 350 mild cognitive impairment (MCI) participants from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database were selected as the research objects; in addition, the Convolutional Architecture for Fast Feature Embedding (CAFFE) was selected as the framework of the deep learning platform; the FDG PET image features of each participant were extracted by a deep convolution network model to construct the prediction and classification models; therefore, the MCI stage features were classified and the transformation was predicted. The results showed that in terms of the MCI transformation prediction, the sensitivity and specificity of conv3 classification were respectively 91.02% and 77.63%; in terms of the Late Mild Cognitive Impairment (LMCI) and Early Mild Cognitive Impairment (EMCI) classification, the accuracy of conv5 classification was 72.19%, and the sensitivity and specificity of conv5 were all 73% approximately. Thus, it was seen that the model constructed in the research could be used to solve the problems of MCI transformation prediction, which also had certain effects on the classifications of EMCI and LMCI. The risk prediction of AD based on the deep learning model of brain 18F-FDG PET discussed in the research matched the expected results. It provided a relatively accurate reference model for the prediction of AD. Despite the deficiencies of the research process, the research results have provided certain references and guidance for the future exploration of accurate AD prediction model; therefore, the research is of great significance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号