首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 109 毫秒
1.
人类polⅡ启动子的识别   总被引:14,自引:2,他引:12  
依据基因启动子区和非启动子区碱基分布的特征,应用基于多样性增量的二次判别分析 (IDQD),对人类polⅡ启动子进行识别,识别精度达到90%以上的水平,优于其他已发表的 (包括SVM分类器等) 识别算法. 使用IDQD算法也能对转录起始位点 (TSS) 进行较准确的预测,10-fold交叉检验结果的敏感性和特异性分别为86%和91%. 这些结果表明IDQD是一个有效的分类器.  相似文献   

2.
翻译起始位点(TIS,即基因5’端)的精确定位是原核生物基因预测的一个关键问题,而基因组GC含量和翻译起始机制的多样性是影响当前TIS预测水平的重要因素.结合基因组结构的复杂信息(包括GC含量、TIS邻近序列及上游调控信号、序列编码潜能、操纵子结构等),发展刻画翻译起始机制的数学统计模型,据此设计TIS预测的新算法MED.StartPlus.并将MED.StartPlus与同类方法RBSfinder、GS.Finder、MED-Start、TiCo和Hon-yaku等进行系统地比较和评价.测试针对两种数据集进行:当前14个已知的TIS被确认的基因数据集,以及300个物种中功能已知的基因数据集.测试结果表明,MED-StartPlus的预测精度在总体上超过同类方法.尤其是对高GC含量基因组以及具有复杂翻译起始机制的基因组,MED-StartPlus具有明显的优势.  相似文献   

3.
转录起始位点的计算定位是基因转录调控研究的重要内容,但现有方法的识别性能较低。文章作者在已有原核启动子识别算法的基础上,提出了一种基于滑动窗口的原核转录起始位点计算定位方法,通过在合理限定的定位范围内对序列进行滑动扫描,来预测转录起始位点的位置。首先根据窗口序列的交迭组分特征和启动子其它特征分别建立二次判别分类器,用其计算对应位置的似然得分,再利用转录起始位点与翻译起始位点的间隔经验分布信息对似然得分进行修正,最后依照似然得分的分布情况由阈值定位算法确定预测位置。对大肠杆菌真实序列数据的测试结果表明,该定位算法可实现对真实转录起始位点位置的有效预测,与已有算法相比,当敏感性指标同为0.85左右时,特异性指标可从0.20提高至0.65,从而使得定位准确率提高了约20个百分点。  相似文献   

4.
为提高非翻译区剪接位点识别的精度,提出一种统计概率与支持向量机相结合的识别方法 .该方法主要分为两个阶段,第一阶段应用统计学方法对非翻译区(UTR)序列进行描述,将序列中各碱基之间的相关性、位置特异性、保守性等特征用概率形式描述,以概率参数作为第二阶段支持向量机的输入向量,第二阶段应用带有多项式核函数的支持向量机(SVM)对剪接位点进行识别.通过对人类5′UTR剪接位点数据集进行测试,结果表明:该方法对非翻译区剪接位点的识别取得了很好的效果.  相似文献   

5.
李静  韩震  王文柳  崔艳荣 《生态科学》2019,38(4):135-141
潮滩地带环境复杂多变, 有些植被之间光谱特性相似, 为了解决植被精细分类精度不高的问题, 利用基于ImageNet预训练的卷积神经网络OverFeat模型, 以高分二号(GF-2)卫星遥感影像作为实验数据, 对长江口南汇潮滩不同生长状态的植被进行了深度特征提取, 然后将模型训练好的深度特征输入到支持向量机(SVM)分类器中, 得到植被分布信息。研究结果表明, 与基于光谱特征的SVM分类方法相比, 文章所用方法的分类精度更高, 总体精度可达96.08%, 证明了使用基于ImageNet数据集的预训练卷积神经网络对不同生长状态的植被可以实现较好的识别。  相似文献   

6.
基于支持向量机的人类5’非翻译区剪接位点识别   总被引:5,自引:0,他引:5  
基因非编码区域剪接位点的识别是基因识别中一个非常具有挑战性的问题,尤其是5’非翻译区中剪接位点的识别。与一般剪接位点不同,5’非翻译区剪接位点的两侧不存在由编码到非编码的状态转移,所以通常的剪接位点识别算法在非翻译区的性能不太理想。文章采用了基于支持向量机的方法对5’非翻译区中的剪接位点进行识别。为了提高识别精度,采用了基于矩阵相似性度量的核函数参数选取方法,它能够简单快速地确定合适的核函数参数,进而提高核函数的识别性能。通过实验验证,经过参数选择后的支持向量机能够较好地识别5'非翻译区剪接位点。  相似文献   

7.
MicroRNA(miRNA)是一类长度约为21 nt的非编码RNA,在动植物中发挥着重要而广泛的转录后调控作用. 现有的计算预测方法通常不能很好地识别具有多分枝茎环二级结构的pre miRNA.为进一步提高对pre miRNA的预测精度,本文在以往研究的基础上,新引用了一类多茎环生物学特征,将遗传算法(GA)与支持向量机(SVM)结合以进行特征选择,同时优化SVM分类器模型参数(c,g),并对数据集的不平衡性进行处理,构造出新的分类器.本文采用人类pre miRNA作为研究数据集,通过5折交叉验证,实验结果显示,新的分类器能够有效地提高预测精度.  相似文献   

8.
目的:通过优化PET11b-s TNFαRI 5'mRNA翻译起始区(TIR)二级结构从而提高可溶性肿瘤坏死因子I型受体(sTNFαRI)在大肠杆菌[E.coli BL21(DE3)]中的表达水平。方法:通过对PET11b-s TNFαRI mRNA 5'端TIR区二级结构的自由能及核苷酸位置熵分析,设计相应的引物对mRNA 5'翻译起始区(TIR)相应密码子进行突变,从而使核糖体结合位点(RBS)及起始密码子(AUG)暴露于发夹结构之外,此外将p ET11b核糖体结合位点由GAAGGAGA突变为GAAGAA,以利于翻译复合体的组装以及翻译起始。通过基因克隆的方法将5'端TIR区优化后的序列与s TNFαRI序列一起克隆到p ET11b载体中,并转化大肠杆菌BL21(DE3),阳性转化子经IPTG诱导表达,SDS-PAGE和Western blot检测。结果:通过对PET11b-s TNFαRI 5'TIR mRNA二级结构优化,经SDS-PAGE和Western blot分析表明重组s TNFαRI的表达水平较优化前提高50%~60%。结论:通过对重组载体翻译起始区(TIR)mRNA序列的二级结构优化可以有效提高目的蛋白的表达水平,对进一步工业化生产具有重要的应用价值。  相似文献   

9.
目的:优化5′-cDNA末端快速扩增(5′-RACE)实验平台,用于定位副溶血弧菌(VP)基因的转录起始位点。方法:提取VP的总RNA,用rDNaseⅠ消化去除可能污染的基因组DNA;利用T4 RNA连接酶将已知序列的寡核苷酸片段连接至RNA的5′端,进而将其逆转录成cDNA;以cDNA为模板,采用巢式PCR技术扩增目的基因DNA片段,并将其直接克隆入T载体;最后通过测序比对的方法确定靶基因的转录起始位点。利用引物延伸实验进一步研究VPA1027的转录起始位点,以检验5′-RACE实验结果的可靠性。结果:5′-RACE实验结果表明,VPA1027、scrG、scrA、cpsA及VPA0198的转录起始位点分别为G(-103)、G(-70)、T(-205)、C(-129)和G(-238)(翻译起始位点为+1);引物延伸结果显示,VPA1027的转录起始位点也为G(-103)。结论:优化后的5′-RACE实验可以精确定位VP基因的转录起始位点。  相似文献   

10.
隐半马氏模型在3′剪接位点识别中的应用(英)   总被引:1,自引:0,他引:1  
新近的基因识别软件比先前的软件有着显著的提高,但是在外显子水平上的敏感性和特异性仍然不十分令人满意.这是因为已有软件对于剪接位点,翻译起始等生物信号位点的识别还不够有效.如果能够分别提高这些生物信号位点的识别效果,就能够提高整体的基因识别效率.隐半马氏模型能够很好地刻画3′剪接位点(acceptor)的结构.据此开发的一套对acceptor进行识别的算法在Burset/Guigo的数据集上经过检验,获得了比已有算法更好的识别率.该模型的成功还使得我们对剪接点上游的分支位点和嘧啶富含区的概貌有了一定的认识,加深了人们对于acceptor的结构和剪接过程的理解.  相似文献   

11.
MOTIVATION: At present the computational gene identification methods in microbial genomes have a high prediction accuracy of verified translation termination site (3' end), but a much lower accuracy of the translation initiation site (TIS, 5' end). The latter is important to the analysis and the understanding of the putative protein of a gene and the regulatory machinery of the translation. Improving the accuracy of prediction of TIS is one of the remaining open problems. RESULTS: In this paper, we develop a four-component statistical model to describe the TIS of prokaryotic genes. The model incorporates several features with biological meanings, including the correlation between translation termination site and TIS of genes, the sequence content around the start codon; the sequence content of the consensus signal related to ribosomal binding sites (RBSs), and the correlation between TIS and the upstream consensus signal. An entirely non-supervised training system is constructed, which takes as input a set of annotated coding open reading frames (ORFs) by any gene finder, and gives as output a set of organism-specific parameters (without any prior knowledge or empirical constants and formulas). The novel algorithm is tested on a set of reliable datasets of genes from Escherichia coli and Bacillus subtillis. MED-Start may correctly predict 95.4% of the start sites of 195 experimentally confirmed E.coli genes, 96.6% of 58 reliable B.subtillis genes. Moreover, the test results indicate that the algorithm gives higher accuracy for more reliable datasets, and is robust to the variation of gene length. MED-Start may be used as a postprocessor for a gene finder. After processing by our program, the improvement of gene start prediction of gene finder system is remarkable, e.g. the accuracy of TIS predicted by MED 1.0 increases from 61.7 to 91.5% for 854 E.coli verified genes, while that by GLIMMER 2.02 increases from 63.2 to 92.0% for the same dataset. These results show that our algorithm is one of the most accurate methods to identify TIS of prokaryotic genomes. AVAILABILITY: The program MED-Start can be accessed through the website of CTB at Peking University: http://ctb.pku.edu.cn/main/SheGroup/MED_Start.htm.  相似文献   

12.
We have introduced a new method of protein secondary structure prediction which is based on the theory of support vector machine (SVM). SVM represents a new approach to supervised pattern classification which has been successfully applied to a wide range of pattern recognition problems, including object recognition, speaker identification, gene function prediction with microarray expression profile, etc. In these cases, the performance of SVM either matches or is significantly better than that of traditional machine learning approaches, including neural networks.The first use of the SVM approach to predict protein secondary structure is described here. Unlike the previous studies, we first constructed several binary classifiers, then assembled a tertiary classifier for three secondary structure states (helix, sheet and coil) based on these binary classifiers. The SVM method achieved a good performance of segment overlap accuracy SOV=76.2 % through sevenfold cross validation on a database of 513 non-homologous protein chains with multiple sequence alignments, which out-performs existing methods. Meanwhile three-state overall per-residue accuracy Q(3) achieved 73.5 %, which is at least comparable to existing single prediction methods. Furthermore a useful "reliability index" for the predictions was developed. In addition, SVM has many attractive features, including effective avoidance of overfitting, the ability to handle large feature spaces, information condensing of the given data set, etc. The SVM method is conveniently applied to many other pattern classification tasks in biology.  相似文献   

13.
Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variations amongst species. With the genome‐wide SNP discovery, many genome‐wide association studies are likely to identify multiple genetic variants that are associated with complex diseases. However, genotyping all existing SNPs for a large number of samples is still challenging even though SNP arrays have been developed to facilitate the task. Therefore, it is essential to select only informative SNPs representing the original SNP distributions in the genome (tag SNP selection) for genome‐wide association studies. These SNPs are usually chosen from haplotypes and called haplotype tag SNPs (htSNPs). Accordingly, the scale and cost of genotyping are expected to be largely reduced. We introduce binary particle swarm optimization (BPSO) with local search capability to improve the prediction accuracy of STAMPA. The proposed method does not rely on block partitioning of the genomic region, and consistently identified tag SNPs with higher prediction accuracy than either STAMPA or SVM/STSA. We compared the prediction accuracy and time complexity of BPSO to STAMPA and an SVM‐based (SVM/STSA) method using publicly available data sets. For STAMPA and SVM/STSA, BPSO effective improved prediction accuracy for smaller and larger scale data sets. These results demonstrate that the BPSO method selects tag SNP with higher accuracy no matter the scale of data sets is used. © 2009 American Institute of Chemical Engineers Biotechnol. Prog., 2010  相似文献   

14.
张霞  李占斌  张振文  邓彦 《生态学报》2012,32(21):6788-6794
预测陕西洛惠渠灌区地下水动态变化情况,在综合分析了各种地下水动态研究方法的基础上,提出了基于支持向量机和改进的BP神经网络模型的灌区地下水动态预测方法,并在MATLAB中编制了相应的计算机程序,建立了相应的地下水动态预测模型。以灌区多年实例数据为学习样本和测试样本,比较了两种模型的地下水动态预测优劣性。研究表明,支持向量机模型和BP网络模型在样本训练学习过程中都具较高的模拟精度,而在样本学习阶段,支持向量机的预测精度明显优于BP网络,可以很好的描述地下水动态复杂的耦合关系。支持向量机方法切实可行,更加适合大型灌区地下水动态预测,是对传统地下水动态研究方法的补充与完善。  相似文献   

15.
Prognostic prediction is important in medical domain, because it can be used to select an appropriate treatment for a patient by predicting the patient's clinical outcomes. For high-dimensional data, a normal prognostic method undergoes two steps: feature selection and prognosis analysis. Recently, the L?-L?-norm Support Vector Machine (L?-L? SVM) has been developed as an effective classification technique and shown good classification performance with automatic feature selection. In this paper, we extend L?-L? SVM for regression analysis with automatic feature selection. We further improve the L?-L? SVM for prognostic prediction by utilizing the information of censored data as constraints. We design an efficient solution to the new optimization problem. The proposed method is compared with other seven prognostic prediction methods on three realworld data sets. The experimental results show that the proposed method performs consistently better than the medium performance. It is more efficient than other algorithms with the similar performance.  相似文献   

16.
预测物种潜在分布区——比较SVM与GARP   总被引:2,自引:0,他引:2       下载免费PDF全文
 物种分布与环境因子之间存在着紧密的联系,因此利用环境因子作为预测物种分布模型的变量是当前最普遍的建模思路,但是绝大多数物种分 布预测模型都遇到了难以解决的“高维小样本"问题。该研究通过理论和实践证明,基于结构风险最小化原理的支持向量机(Support vector machine, SVM)算法非常适合“高维小样本"的分类问题。以20种杜鹃花属(Rhododendron)中国特有种为检验对象,利用标本数据和11个1 km×1 km的栅格环境数据层作为模型变量,预测其在中国的潜在分布区,并通过全面的模型评估——专家评估,受试者工作特征(Receiver operator characteristic, ROC)曲线和曲线下方面积(Area under the curve, AUC)——来比较模型的性能。我们实现了以SVM为核心的物种分布预测 系统,并且通过试验证明其无论在计算速度还是预测效果上都远远优于当前广泛使用的规则集合预测的遗传算法(Algorithm for rule-set prediction, GARP)预测系统。  相似文献   

17.
18.
Liu H  Han H  Li J  Wong L 《In silico biology》2004,4(3):255-269
The translation initiation site (TIS) prediction problem is about how to correctly identify TIS in mRNA, cDNA, or other types of genomic sequences. High prediction accuracy can be helpful in a better understanding of protein coding from nucleotide sequences. This is an important step in genomic analysis to determine protein coding from nucleotide sequences. In this paper, we present an in silico method to predict translation initiation sites in vertebrate cDNA or mRNA sequences. This method consists of three sequential steps as follows. In the first step, candidate features are generated using k-gram amino acid patterns. In the second step, a small number of top-ranked features are selected by an entropy-based algorithm. In the third step, a classification model is built to recognize true TISs by applying support vector machines or ensembles of decision trees to the selected features. We have tested our method on several independent data sets, including two public ones and our own extracted sequences. The experimental results achieved are better than those reported previously using the same data sets. Our high accuracy not only demonstrates the feasibility of our method, but also indicates that there might be "amino acid" patterns around TIS in cDNA and mRNA sequences.  相似文献   

19.
N. Bhaskar  M. Suchetha 《IRBM》2021,42(4):268-276
ObjectivesIn this paper, we propose a computationally efficient Correlational Neural Network (CorrNN) learning model and an automated diagnosis system for detecting Chronic Kidney Disease (CKD). A Support Vector Machine (SVM) classifier is integrated with the CorrNN model for improving the prediction accuracy.Material and methodsThe proposed hybrid model is trained and tested with a novel sensing module. We have monitored the concentration of urea in the saliva sample to detect the disease. Experiments are carried out to test the model with real-time samples and to compare its performance with conventional Convolutional Neural Network (CNN) and other traditional data classification methods.ResultsThe proposed method outperforms the conventional methods in terms of computational speed and prediction accuracy. The CorrNN-SVM combined network achieved a prediction accuracy of 98.67%. The experimental evaluations show a reduction in overall computation time of about 9.85% compared to the conventional CNN algorithm.ConclusionThe use of the SVM classifier has improved the capability of the network to make predictions more accurately. The proposed framework substantially advances the current methodology, and it provides more precise results compared to other data classification methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号