首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
探讨原发性肝癌患者精确放疗后乙型肝炎病毒(hepatitis b virus,HBV)再激活的危险特征和分类预测模型。提出基于遗传算法的特征选择方法,从原发性肝癌数据的初始特征集中选择HBV再激活的最优特征子集。建立贝叶斯和支持向量机的HBV再激活分类预测模型,并预测最优特征子集和初始特征集的分类性能。实验结果表明,基于遗传算法的特征选择提高了HBV再激活分类性能,最优特征子集的分类性能明显优于初始特征子集的分类性能。影响HBV再激活的最优特征子集包括:HBV DNA水平,肿瘤分期TNM,Child-Pugh,外放边界和全肝最大剂量。贝叶斯的分类准确性最高可达82.89%,支持向量机的分类准确性最高可达83.34%。  相似文献   

2.
癌症基因表达谱挖掘中的特征基因选择算法GA/WV   总被引:1,自引:0,他引:1  
鉴定癌症表达谱的特征基因集合可以促进癌症类型分类的研究,这也可能使病人获得更好的临床诊断?虽然一些方法在基因表达谱分析上取得了成功,但是用基因表达谱数据进行癌症分类研究依然是一个巨大的挑战,其主要原因在于缺少通用而可靠的基因重要性评估方法。GA/WV是一种新的用复杂的生物表达数据评估基因分类重要性的方法,通过联合遗传算法(GA)和加权投票分类算法(WV)得到的特征基因集合不但适用于WV分类器,也适用于其它分类器?将GA/WV方法用癌症基因表达谱数据集的验证,结果表明本方法是一种成功可靠的特征基因选择方法。  相似文献   

3.
品种分类是畜禽品种遗传资源保护和利用的基础,传统分类方法主要依赖于体型外貌特征判断,但因分类指标不易量化,故难以区分相似度较高的品种。机器学习算法在利用基因组信息进行品种分类方面显示出独特优势。为了探索最适合于中国牛品种的分类方法,本研究使用7个地方品种共213头牛的基因组SNP数据,对比了FST值排序筛选、mRMR、Relief-F三种SNP选择方法和随机森林(Random Forest, RF)、支持向量机(Support Vector Machine, SVM)、朴素贝叶斯(Naive Byes, NB)三种不同机器学习算法对品种分类准确性的影响。结果表明:1)使用FST方法筛选1500个以上SNP,或使用mRMR算法筛选1000个以上SNP,SVM分类算法可以达到99.47%以上的分类准确率;2)分类效果最好的算法是SVM算法,其次是NB算法,而最好的SNP选择方法是FST和mRMR算法,其次是Relief-F;3)品种错误归类情况常出现在相似性较高的品种间。本研究显示机器学习分类模型结合基因组数据是对牛地方品种鉴别的有效方法,为我国牛品种的快速准确分类提供了技术依据。  相似文献   

4.
基于决策森林特征基因的两种识别方法   总被引:1,自引:0,他引:1  
应用DNA芯片可获得成千上万个基因的表达谱数据。寻找对疾病有鉴别力的特征基因 ,滤掉与疾病无关的基因是基因表达谱数据分析的关键问题。利用决策森林方法的集成优势 ,提出基于决策森林的两种特征基因识别方法。该方法先由决策森林按照一定的显著性水平滤掉大部分与疾病类别无关的基因 ,然后采用统计频数法和扰动法 ,根据所选特征对分类的贡献程度对初选的特征基因作更加精细地选择。最后 ,选用神经网络作为外部分类器对所选的特征基因子集进行评价 ,将提出的方法应用于 4 0例结肠癌组织与 2 2例正常组织中 2 0 0 0个基因的表达谱实验数据。结果表明 :上述两种方法选出的特征基因均具有较高的疾病鉴别能力 ,均可获得最优特征基因子集 ,基于决策森林的统计频数法优于扰动法。  相似文献   

5.
基于肿瘤基因表达谱, 利用生物信息学的方法, 从肿瘤与正常组织的样本分类入手就肿瘤特异表达基因的发现及其表达模式问题进行了分析和研究, 进而探讨了肿瘤在基因表达上的特点. 首先, 在分析肿瘤基因表达谱特点的基础上, 提出了基于Relief算法的样本分类特征基因选取策略; 然后, 以支持向量机为分类工具进行样本类型的识别, 以分类错误率为标准选取样本分类特征基因, 并对其中反映肿瘤与正常样本组织构成特点的组织特异表达基因进行排除以突出肿瘤样本真实的类别特征; 最后结合统计学方法, 从信息学的角度论证了分类特征基因在肿瘤组织中特异表达的确实性和普遍性, 并对这些基因在肿瘤组织中呈现出的特异的表达模式进行了分析.  相似文献   

6.
支持向量机是一种基于统计学习理论的新型学习机。文章提出一种基于支持向量机的癫痫脑电特征提取与识别方法,充分发挥其泛化能力强的特点,在与神经网络方法的比较中,表现出较低的漏检率和较好的鲁棒性,有深入研究的价值和良好的应用前景。  相似文献   

7.
针对分立波长型近红外判别仪器开发过程中遇到的组合波长优选问题,提出一种基于对应分析的有效波长选择方法.对11种蘑菇样品及其在701-2500nm的光谱信息作对应分析,在对应图上计算各样品点与波长点的距离后,首先由各样品点的最近距离波长点确定出最稳健波长组合,在此基础上,对各样品点的前10、50、200个最近距离波长点进行优选,得出用921nm、1376nm、1424nm、1720nm、2233nm、2454nm这6个波长建立的SVM模型,能在输入变量最少的情况下达到用最稳健波长组合所建模型的预测能力.  相似文献   

8.
原发性肝癌(PLC)患者在精确放疗后乙型肝炎病毒(HBV)再激活是一种常见并发症,及时的预测防护能降低发病率、死亡率。研究表明:多余的特征变量会影响HBV再激活的预测精度。通过提出基于近邻成分分析(NCA)的特征选择方法找出HBV再激活的危险因素及特征组合。之后分别建立经Bayes优化前后的支持向量机模型(SVM)对这些关键特征子集及初始特征集进行分类预测。实验结果表:明HBV DNA水平、KPS评分、分割方式、外放边界、V25、肿瘤分期TNM、ChildPugh等都是影响HBV再激活的危险因素。其中经NCA特征选择之后发现的V25是在乙型肝炎病毒再激活研究中首次提出的危险因素。10折交叉验证下特征组合HBV DNA水平、外放边界、V25的预测精度高达86.11%。支持向量机分类器可以很好的应用于乙型肝炎病毒再激活的研究,特征选择后的关键特征组合具有更优越的分类性能。  相似文献   

9.
随机森林:一种重要的肿瘤特征基因选择法   总被引:2,自引:0,他引:2       下载免费PDF全文
特征选择技术已经被广泛地应用于生物信息学科,随机森林(random forests,RF)是其中一种重要的特征选择方法。利用RF对胃癌、结肠癌和肺癌等5组基因表达谱数据进行特征基因选择,将选择结果与支持向量机(support vector machine,SVM)结合对原数据集分类,并对特征基因选择及分类结果进行初步的分析。同时使用微阵列显著性分析(significant analysis of microarray,SAM)和ReliefF法与RF比较,结果显示随机森林选择的特征基因包含更多分类信息,分类准确率更高。结合该方法自身具有的分类方面的诸多优势,随机森林可以作为一种可靠的基因表达谱数据分析手段被广泛使用。  相似文献   

10.
长链非编码RNA(Long non-coding RNA, lncRNA)是一类被定义为转录本的长度大于200 nt、没有蛋白编码能力的RNA转录本。研究表明,lncRNA在调节植物生长发育、表观遗传反应以及各种胁迫反应中起重要作用。但是与人类和动物相比,植物lncRNA的研究仍然处于起步阶段。目前,如何从大量的转录本中准确地挑选出lncRNA仍然是植物lncRNA研究领域的重要问题之一。本文构建了新的植物lncRNA和mRNA数据集,分析了数据集中植物lncRNA的序列及结构特征,提取了序列的k-mer频数信息、二级结构信息、开放阅读框信息以及序列的几何柔性等特征,基于SVM(Support Vector Machine, SVM)算法,用Jackknife检验对植物lncRNA进行了预测,并且计算了各种特征融合后对植物lncRNA预测结果的影响,准确率达到了96.14%。  相似文献   

11.
Microarray data are often extremely asymmetric in dimensionality,such as thousands or even tens of thousands of genes but only a few hundreds of samples or less.Such extreme asymmetry between the dimensionality of genes and samples can lead to inaccurate diagnosis of disease in clinic.Therefore,it has been shown that selecting a small set of marker genes can lead to improved classification accuracy.In this paper,a simple modified ant colony optimization (ACO) algorithm is proposed to select tumor-related ma...  相似文献   

12.
We propose a new method for tumor classification from gene expression data, which mainly contains three steps. Firstly, the original DNA microarray gene expression data are modeled by independent component analysis (ICA). Secondly, the most discriminant eigenassays extracted by ICA are selected by the sequential floating forward selection technique. Finally, support vector machine is used to classify the modeling data. To show the validity of the proposed method, we applied it to classify three DNA microarray datasets involving various human normal and tumor tissue samples. The experimental results show that the method is efficient and feasible.  相似文献   

13.
刘玉杰  刘毅慧 《生物信息学》2011,9(3):255-258,262
特征提取和分类是模式识别中的关键问题。结合小波分析理论和支持向量机理论,构造分类器模型,将前列腺癌基因芯片数据分成癌症和正常两种。提取小波低频系数表征原始数据并送入支持向量机分类器分类,实验证明:提取db1小波4层分解下的低频系数,送入分类器分类后正确分类率达到93.53%。Haar小波的正确率是92.94%。可见提取不同小波低频系数,得到的分类效果相差不大。  相似文献   

14.
    
Mechanisms through which tissues are formed and maintained remain unknown but are fundamental aspects in biology. Tissue-specific gene expression is a valuable tool to study such mechanisms. But in many biomedical studies, cell lines, rather than human body tissues, are used to investigate biological mechanisms Whether or not cell lines maintain their tissue-specific characteristics after they are isolated and cultured outside the human body remains to be explored. In this study, we applied a novel computational method to identify core genes that contribute to the differentiation of cell lines from various tissues. Several advanced computational techniques, such as Monte Carlo feature selection method, incremental feature selection method, and support vector machine (SVM) algorithm, were incorporated in the proposed method, which extensively analyzed the gene expression profiles of cell lines from different tissues. As a result, we extracted a group of functional genes that can indicate the differences of cell lines in different tissues and built an optimal SVM classifier for identifying cell lines in different tissues. In addition, a set of rules for classifying cell lines were also reported, which can give a clearer picture of cell lines in different issues although its performance was not better than the optimal SVM classifier. Finally, we compared such genes with the tissue-specific genes identified by the Genotype-tissue Expression project. Results showed that most expression patterns between tissues remained in the derived cell lines despite some uniqueness that some genes show tissue specificity.  相似文献   

15.
  总被引:3,自引:0,他引:3  
Yan X  Chao T  Tu K  Zhang Y  Xie L  Gong Y  Yuan J  Qiang B  Peng X 《FEBS letters》2007,581(8):1587-1593
  相似文献   

16.
Pluripotent stem cells are able to self-renew, and to differentiate into all adult cell types. Many studies report data describing these cells, and characterize them in molecular terms. Machine learning yields classifiers that can accurately identify pluripotent stem cells, but there is a lack of studies yielding minimal sets of best biomarkers (genes/features). We assembled gene expression data of pluripotent stem cells and non-pluripotent cells from the mouse. After normalization and filtering, we applied machine learning, classifying samples into pluripotent and non-pluripotent with high cross-validated accuracy. Furthermore, to identify minimal sets of best biomarkers, we used three methods: information gain, random forests and a wrapper of genetic algorithm and support vector machine (GA/SVM). We demonstrate that the GA/SVM biomarkers work best in combination with each other; pathway and enrichment analyses show that they cover the widest variety of processes implicated in pluripotency. The GA/SVM wrapper yields best biomarkers, no matter which classification method is used. The consensus best biomarker based on the three methods is Tet1, implicated in pluripotency just recently. The best biomarker based on the GA/SVM wrapper approach alone is Fam134b, possibly a missing link between pluripotency and some standard surface markers of unknown function processed by the Golgi apparatus.  相似文献   

17.
Gene expression profiles of 14 common tumors and their counterpart normal tissues were analyzed with machine learning methods to address the problem of selection of tumor-specific genes and analysis of their differential expressions in tumor tissues. First, a variation of the Relief algorithm, “RFE_Relief algorithm” was proposed to learn the relations between genes and tissue types. Then, a support vector machine was employed to find the gene subset with the best classification performance for distinguishing cancerous tissues and their counterparts. After tissue-specific genes were removed, cross validation experiments were employed to demonstrate the common deregulated expressions of the selected gene in tumor tissues. The results indicate the existence of a specific expression fingerprint of these genes that is shared in different tumor tissues, and the hallmarks of the expression patterns of these genes in cancerous tissues are summarized at the end of this paper.  相似文献   

18.
Xu W  Wang M  Zhang X  Wang L  Feng H 《Bioinformation》2008,2(7):301-303
Gene selection is to detect the most significantly expressed genes under different conditions expression data. The current challenge in gene selection is the comparison of a large number of genes with limited patient samples. Thus it is trivial task in simple statistical analysis. Various statistical measurements are adopted by filter methods applied in gene selection studies. Their ability to discriminate phenotypes is crucial in classification and selection. Here we describe the standard deviation error distribution (SDED) method for gene selection. It utilizes variations within-class and among-class in gene expression data. We tested the method using 4 leukemia datasets available in the public domain. The method was compared with the GS2 and CHO methods. The Prediction accuracies by SDED are better than both GS2 and CHO for different datasets. These are 0.8-4.2% and 1.6-8.4% more that in GS2 and CHO. The related OMIM annotations and KEGG pathways analyses verified that SDED can pick out more 4.0% and 6.1% genes with biological significance than GS2 and CHO, respectively.  相似文献   

19.
  总被引:1,自引:0,他引:1  
Gene expression profiles of 14 common tumors and their counterpart normal tissues were analyzed with machine learning methods to address the problem of selection of tumor-specific genes and analysis of their differential expressions in tumor tissues. First, a variation of the Relief algorithm, \"RFE_Relief algorithm\" was proposed to learn the relations between genes and tissue types. Then, a support vector machine was employed to find the gene subset with the best classification performance for distinguishing cancerous tissues and their counterparts. After tissue-specific genes were removed, cross validation experiments were employed to demonstrate the common deregulated expressions of the selected gene in tumor tissues. The results indicate the existence of a specific expression fingerprint of these genes that is shared in different tumor tissues, and the hallmarks of the expression patterns of these genes in cancerous tissues are summarized at the end of this paper.  相似文献   

20.
ABSTRACT:?

This review describes information concerning positive selection vectors on their mechanism, classification, property, and limitation. A total of 72 positive selection vectors collected were discussed. Positive selection vectors can reduce background and directly screen transformants containing cloned DNA fragments. The mechanisms to perform positive selection include insertional inacti-vation and the replacement of functional genes of the vectors. In general, the former is much more convenient than the latter. The functional genes are controlled either by their promoters or by heter-ologous promoters introduced. On the basis of the structures, positive selection vectors could be classified into five groups. The positive selection vectors are commonly based on the mechanisms of lethal genes and the sensitivity of compounds. The vectors, with molecular weights ranging from 2.6 to 17.0?kb, have diverse genetic markers and wide host ranges, including Escherichia coli, Bacillus, Streptomyces, lactic acid bacteria, yeasts, and mammalian cells. Although some limitations exist for using some positive selection vectors, they are useful in recombinant DNA experiments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号