首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 640 毫秒
1.
针对传统方法在蛋白质二级结构分类中精度低的问题,介绍了一种基于灰狼优化算法的卷积神经网络图像分类算法.首先,选取卷积神经网络模型中所需优化的参数,并且初始化灰狼优化算法的迭代次数、灰狼数量、搜索边界和空间维数;其次,计算优化参数的个体适应度函数,对个体适应度进行排序,确定历史最优解、优解和次优解,更新灰狼的位置;最后,利用经过参数优化的卷积神经网络模型对蛋白质二级结构进行分类.从蛋白质数据库中获得蛋白质二级结构3D模型,转化为多角度拍摄的2D图像作为数据集进行实验.选取残差网络、AlexNet和VGG16三种模型,分别得到92.6%、87.3%和88.9%的准确率,在同数据集下,使用传统方法中的支持向量机和贝叶斯分类器进行对比实验,得到67.0%和53.0%的准确率.实验结果表明,在蛋白质二级结构分类中,与传统方法相比较,基于灰狼优化算法的卷积神经网络精度更高.  相似文献   

2.
邹凌云  王正志  黄教民 《遗传学报》2007,34(12):1080-1087
蛋白质必须处于正确的亚细胞位置才能行使其功能。文章利用PSI-BLAST工具搜索蛋白质序列,提取位点特异性谱中的位点特异性得分矩阵作为蛋白质的一类特征,并计算4等分序列的氨基酸含量以及1~7阶二肽含量作为另外两类特征,由这三类特征一共得到蛋白质序列的12个特征向量。通过设计一个简单加权函数对各类特征向量加权处理,作为神经网络预测器的输入,并使用Levenberg-Marquardt算法代替传统的EBP算法来调整网络权值和阈值,大大提高了训练速度。对具有4类亚细胞位置和12类亚细胞位置的两种蛋白质数据集分别进行"留一法"测试和5倍交叉验证测试,总体预测精度分别达到88.4%和83.3%。其中,对4类亚细胞位置数据集的预测效果优于普通BP神经网络、隐马尔可夫模型、模糊K邻近等预测方法,对12类亚细胞位置数据集的预测效果优于支持向量机分类方法。最后还对三类特征采取不同加权比例对预测精度的影响进行了讨论,对选择的八种加权比例的预测结果表明,分别给予三类特征合适的权值系数可以进一步提高预测精度。  相似文献   

3.
药物研发是非常重要但也十分耗费人力物力的过程。利用计算机辅助预测药物与蛋白质亲和力的方法可以极大地加快药物研发过程。药物靶标亲和力预测的关键在于对药物和蛋白质进行准确详细地信息表征。提出一种基于深度学习与多层次信息融合的药物靶标亲和力的预测模型,试图通过综合药物与蛋白质的多层次信息,来获得更好的预测表现。首先将药物表述成分子图和扩展连接指纹两种形式,分别利用图卷积神经网络模块和全连接层进行学习;其次将蛋白质序列和蛋白质K-mer特征分别输入卷积神经网络模块和全连接层来学习蛋白质潜在特征;随后将4个通道学习到的特征进行融合,再利用全连接层进行预测。在两个基准药物靶标亲和力数据集上验证了所提方法的有效性,并与其他已有模型作对比研究。结果说明提出的模型相比基准模型能得到更好的预测性能,表明提出的综合药物与蛋白质多层次信息的药物靶标亲和力预测策略是有效的。  相似文献   

4.
目的 蛋白质的柔性运动对生物体各种反应有着重要意义,基于蛋白质的空间结构预测其柔性运动是蛋白质结构-功能关系领域的重要问题。卷积神经网络(convolutional neural network,CNN)在蛋白质结构-功能关系研究中已有成功应用。方法 本研究借鉴计算机视觉研究中PointNet方法的思想,提出了一种蛋白质柔性预测的CNN模型。在该模型中,分别使用池化操作和空间变换网络来处理蛋白质原子三维点云的排列不变性和整体旋转不变性,针对蛋白质分子大小不一的特点,将大小不等的蛋白质小批量输入网络进行训练,并使用Pearson相关系数作为评价指标。此外为提升模型性能,在CNN模型的基础上,通过最大池化和平均池化串联的方法提取体系的全局特征,增强蛋白质全局信息的提取能力。利用243个非冗余蛋白质的B因子对所提出的模型进行训练和测试。结果 基于PointNet的CNN模型和改进模型对蛋白质B因子的预测值与实验值的平均Pearson相关系数分别为0.64、0.65,优于广泛应用的高斯网络模型(Gaussian network model,GNM)。尤其,对于天然无序蛋白质柔性的预测,本方法明...  相似文献   

5.
按照蛋白质序列中残基的相对可溶性,将其分为两类(表面/内部)和三类(表面/中间/内部)进行预测.选择不同窗宽和参数对数据进行训练和预测,以确保得到最好的分类效果,并同其他已有方法进行比较.对同一数据集不同分类阈值的预测结果显示,支持向量机方法对蛋白质可溶性的整体预测效果优于神经网络和信息论的方法.其中,对两类数据的最优分类结果达到79.0%,对三类数据的最优分类结果达到67.5%,表明支持向量机是蛋白质残基可溶性预测的一种有效方法.  相似文献   

6.
黄静 《生物数学学报》2003,18(3):351-356
提出了一种利用神经网络为蛋白质家族建立模型的方法,这一方法的理论出发点是利用神经网络从一组同家族蛋白质序列中识别出共同的特征模式,建好的模型可用于预测蛋白质家族,使用这一方法。所能识别的模式在长度、位点等方面都不受限制。而且建模及预测过程中输入神经网络的蛋白质序列不需要作预对齐。对Pfam蛋白质库中的二十个家族运用此方法,预测的平均正确率达到了95.5%。  相似文献   

7.
可变剪接源于多外显子基因生成多个转录本的调控过程。随着高通量测序,尤其是RNA-seq的研究进展,剪接序列和剪接位点可以通过挖掘海量的测序数据进行预测。可变剪接现象拓宽了人们对基因结构和蛋白质亚型的知识。然而现有的短序列比对软件受到随机性比对的影响,产生很多假阳性剪接位点,干扰下游数据分析。本研究发现,可变剪接位点周边序列的结构特征可被深度学习模型提取,并利用深度卷积神经网络识别剪接位点。本研究的模型具有识别率高、计算速度快,模型泛化能力强、鲁棒性高等优势。  相似文献   

8.
本文提出了一种基于卷积神经网络和循环神经网络的深度学习模型,通过分析基因组序列数据,识别人基因组中环形RNA剪接位点.首先,根据预处理后的核苷酸序列,设计了2种网络深度、8种卷积核大小和3种长短期记忆(long short term memory,LSTM)参数,共8组16个模型;其次,进一步针对池化层进行均值池化和最大池化的测试,并加入GC含量提高模型的预测能力;最后,对已经实验验证过的人类精浆中环形RNA进行了预测.结果表明,卷积核尺寸为32×4、深度为1、LSTM参数为32的模型识别率最高,在训练集上为0.9824,在测试数据集上准确率为0.95,并且在实验验证数据上的正确识别率为83%.该模型在人的环形RNA剪接位点识别方面具有较好的性能.  相似文献   

9.
基于支持向量机方法的蛋白可溶性预测   总被引:1,自引:0,他引:1  
按照蛋白质序列中残基的相对可溶性,将其分为两类(表面/内部)和三类(表面/中间/内部)进行预测。选择不同窗宽和参数对数据进行训练和预测,以确保得到最好的分类效果,并同其他已有方法进行比较。对同一数据集不同分类阈值的预测结果显示,支持向量机方法对蛋白质可溶性的整体预测效果优于神经网络和信息论的方法。其中,对两类数据的最优分类结果达到79.0%,对三类数据的最优分类结果达到67.5%,表明支持向量机是蛋白质残基可溶性预测的一种有效方法。  相似文献   

10.
根据实验测定的Ⅰ类金属硫蛋白(metallothionein, MT)三级结构的实验数据,给出该类蛋白质的两种特征结构(CXC、CXXC一级结构,半胱氨酸-金属络合簇三级结构)的原子间距离约束条件,然后运用距离几何算法计算出一系列可能的构象.从这些构象中经统计分析筛选出目标函数值显著较小的结构作为所预测蛋白质的三级结构模型.用已知结构的蓝蟹MT对方法进行检验证实其可行性后,对植物炭疽病真菌金属硫蛋白CAP3进行了三级结构预测.  相似文献   

11.
Visual detection of plants diseases over a large area is time-consuming, and the results are prone to errors due to the subjective nature of human evaluations. Several automatic disease detection techniques that improve detection time and improve accuracy compared to visual methods exist, yet they are not suitable for immediate detection. In this paper, we propose a hybrid convolution neural network (CNN) model to speed up the detection of fall armyworms (faw) infested maize leaves. Specifically, the proposed system combines unmanned aerial vehicle (UAV) technology, to autonomously capture maize leaves, and a hybrid CNN model, which is based on a parallel structure specifically designed to take advantage of the benefits of both individual models, namely VGG16 and InceptionV3. We compare the performance of the proposed model in terms of accuracy and training time to four existing CNN models, namely VGG16, InceptionV3, XceptionNet, and Resnet50. The results show that compared to existing models, the proposed hybrid model reduces the training time by 16% to 44% compared to other models while exhibiting the most superior accuracy of 96.98%.  相似文献   

12.
Hydrological time series forecasting remains a difficult task due to its complicated nonlinear, non-stationary and multi-scale characteristics. To solve this difficulty and improve the prediction accuracy, a novel four-stage hybrid model is proposed for hydrological time series forecasting based on the principle of ‘denoising, decomposition and ensemble’. The proposed model has four stages, i.e., denoising, decomposition, components prediction and ensemble. In the denoising stage, the empirical mode decomposition (EMD) method is utilized to reduce the noises in the hydrological time series. Then, an improved method of EMD, the ensemble empirical mode decomposition (EEMD), is applied to decompose the denoised series into a number of intrinsic mode function (IMF) components and one residual component. Next, the radial basis function neural network (RBFNN) is adopted to predict the trend of all of the components obtained in the decomposition stage. In the final ensemble prediction stage, the forecasting results of all of the IMF and residual components obtained in the third stage are combined to generate the final prediction results, using a linear neural network (LNN) model. For illustration and verification, six hydrological cases with different characteristics are used to test the effectiveness of the proposed model. The proposed hybrid model performs better than conventional single models, the hybrid models without denoising or decomposition and the hybrid models based on other methods, such as the wavelet analysis (WA)-based hybrid models. In addition, the denoising and decomposition strategies decrease the complexity of the series and reduce the difficulties of the forecasting. With its effective denoising and accurate decomposition ability, high prediction precision and wide applicability, the new model is very promising for complex time series forecasting. This new forecast model is an extension of nonlinear prediction models.  相似文献   

13.
Knowing the number of residue contacts in a protein is crucial for deriving constraints useful in modeling protein folding, protein structure, and/or scoring remote homology searches. Here we use an ensemble of bi-directional recurrent neural network architectures and evolutionary information to improve the state-of-the-art in contact prediction using a large corpus of curated data. The ensemble is used to discriminate between two different states of residue contacts, characterized by a contact number higher or lower than the average value of the residue distribution. The ensemble achieves performances ranging from 70.1% to 73.1% depending on the radius adopted to discriminate contacts (6Ato 12A). These performances represent gains of 15% to 20% over the base line statistical predictors always assigning an aminoacid to the most numerous state, 3% to 7% better than any previous method. Combination of different radius predictors further improves the performance. SERVER: http://promoter.ics.uci.edu/BRNN-PRED/.  相似文献   

14.
Designing protein sequences that fold to a given three-dimensional (3D) structure has long been a challenging problem in computational structural biology with significant theoretical and practical implications. In this study, we first formulated this problem as predicting the residue type given the 3D structural environment around the C α atom of a residue, which is repeated for each residue of a protein. We designed a nine-layer 3D deep convolutional neural network (CNN) that takes as input a gridded box with the atomic coordinates and types around a residue. Several CNN layers were designed to capture structure information at different scales, such as bond lengths, bond angles, torsion angles, and secondary structures. Trained on a very large number of protein structures, the method, called ProDCoNN (protein design with CNN), achieved state-of-the-art performance when tested on large numbers of test proteins and benchmark datasets.  相似文献   

15.
The aim of this study was to evaluate the use of dose difference maps with a convolutional neural network (CNN) to detect multi-leaf collimator (MLC) positional errors in patient-specific quality assurance for volumetric modulated radiation therapy (VMAT). A cylindrical three-dimensional detector (Delta4, ScandiDos, Uppsala, Sweden) was used to measure 161 beams from 104 clinical prostate VMAT plans. For the simulation used error-free plans plus plans with two types of MLC error were introduced: systematic error and random error. A total of 483 dose distributions in a virtual cylindrical phantom were calculated with a treatment planning system. Dose difference maps were created from two planar dose distributions from the measured and calculated dose distributions, and these were used as the input for the CNN, with 375 datasets assigned for training and 108 datasets assigned for testing. The CNN model had three convolution layers and was trained with five-fold cross-validation. The CNN model classified the error types of the plans as “error-free,” “systematic error,” or “random error,” with an overall accuracy of 0.944. The sensitivity values for the “error-free,” “systematic error,” and “random error” classifications were 0.889, 1.000, and 0.944, respectively, and the specificity values were 0.986, 0.986, and 0.944, respectively. This approach was superior to those based on gamma analysis. Using dose difference maps with a CNN model may provide an effective solution for detecting MLC errors for patient-specific VMAT quality assurance.  相似文献   

16.

Background

Human Down syndrome (DS) is usually caused by genomic micro-duplications and dosage imbalances of human chromosome 21. It is associated with many genomic and phenotype abnormalities. Even though human DS occurs about 1 per 1,000 births worldwide, which is a very high rate, researchers haven’t found any effective method to cure DS. Currently, the most efficient ways of human DS prevention are screening and early detection.

Methods

In this study, we used deep learning techniques and analyzed a set of Illumina genotyping array data. We built a bi-stream convolutional neural networks model to screen/predict the occurrence of DS. Firstly, we built image input data by converting the intensities of each SNP site into chromosome SNP maps. Next, we proposed a bi-stream convolutional neural network (CNN) architecture with nine layers and two branch models. We further merged two CNN branch models into one model in the fourth convolutional layer, and output the prediction in the last layer.

Results

Our bi-stream CNN model achieved 99.3% average accuracies, and very low false-positive and false-negative rates, which was necessary for further applications in disease prediction and medical practice. We further visualized the feature maps and learned filters from intermediate convolutional layers, which showed the genomic patterns and correlated SNPs variations in human DS genomes. We also compared our methods with other CNN and traditional machine learning models. We further analyzed and discussed the characteristics and strengths of our bi-stream CNN model.

Conclusions

Our bi-stream model used two branch CNN models to learn the local genome features and regional patterns among adjacent genes and SNP sites from two chromosomes simultaneously. It achieved the best performance in all evaluating metrics when compared with two single-stream CNN models and three traditional machine-learning algorithms. The visualized feature maps also provided opportunities to study the genomic markers and pathway components associated with Human DS, which provided insights for gene therapy and genomic medicine developments.
  相似文献   

17.
Wang Z  Zhao F  Peng J  Xu J 《Proteomics》2011,11(19):3786-3792
Compared with the protein 3-class secondary structure (SS) prediction, the 8-class prediction gains less attention and is also much more challenging, especially for proteins with few sequence homologs. This paper presents a new probabilistic method for 8-class SS prediction using conditional neural fields (CNFs), a recently invented probabilistic graphical model. This CNF method not only models the complex relationship between sequence features and SS, but also exploits the interdependency among SS types of adjacent residues. In addition to sequence profiles, our method also makes use of non-evolutionary information for SS prediction. Tested on the CB513 and RS126 data sets, our method achieves Q8 accuracy of 64.9 and 64.7%, respectively, which are much better than the SSpro8 web server (51.0 and 48.0%, respectively). Our method can also be used to predict other structure properties (e.g. solvent accessibility) of a protein or the SS of RNA.  相似文献   

18.
The subcellular location of a protein is highly related to its function. Identifying the location of a given protein is an essential step for investigating its related problems. Traditional experimental methods can produce solid determination. However, their limitations, such as high cost and low efficiency, are evident. Computational methods provide an alternative means to address these problems. Most previous methods constantly extract features from protein sequences or structures for building prediction models. In this study, we use two types of features and combine them to construct the model. The first feature type is extracted from a protein–protein interaction network to abstract the relationship between the encoded protein and other proteins. The second type is obtained from gene ontology and biological pathways to indicate the existing functions of the encoded protein. These features are analyzed using some feature selection methods. The final optimum features are adopted to build the model with recurrent neural network as the classification algorithm. Such model yields good performance with Matthews correlation coefficient of 0.844. A decision tree is used as a rule learning classifier to extract decision rules. Although the performance of decision rules is poor, they are valuable in revealing the molecular mechanism of proteins with different subcellular locations. The final analysis confirms the reliability of the extracted rules. The source code of the propose method is freely available at https://github.com/xypan1232/rnnloc  相似文献   

19.
Pests are the main threats to crop growth, and the precision classification of pests is conducive to formulating effective prevention and governance strategies. In response to the problems of low efficiency and inadaptability to the large-scale environment of existing pest classification methods, this paper proposes a new pest classification method based on a convolutional neural network (CNN) and an improved Vision Transformer model. First, the MMAlNet is designed to extract the characteristics of the identification object from different scales and finer granularity. Then, a classification model called DenseNet Vision Transformer (DNVT) combining a CNN and an improved vision transformer model is proposed. The proposed DNVT captures both long distance dependencies and local characteristic modelling capabilities, which can effectively improve pest classification accuracy. Finally, the ensemble learning algorithm is used to learn MMAlNet and DNVT classification forecasts for soft voting, further enhancing the classification accuracy of pests. The simulation experiment results on the D0 and IP102 datasets show that the proposed method attained a maximum classification of 99.89 and 74.20%, respectively, which is better than other state-of-the-art methods and has a high practical application value.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号