首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
基于Fiedler向量的基因表达谱数据分类方法   总被引:1,自引:0,他引:1  
尝试将一种基于图的Fiedler向量的聚类算法引入到基因表达谱数据的肿瘤分类中来。该方法将分属不同类的所有样本通过高斯权构造Laplace完全图,经SVD分解后获得Fiedler向量,利用各样本所对应的Fiedler向量分量的符号差异来进行基因表达谱数据的分类。通过模拟数据仿真实验和对白血病两个亚型(ALL与AML)及结肠癌真实数据实验,证明了这一方法的有效性。  相似文献   

2.
基于基因表达谱的肿瘤特异基因表达模式研究   总被引:1,自引:1,他引:0  
基于肿瘤基因表达谱, 利用生物信息学的方法, 从肿瘤与正常组织的样本分类入手就肿瘤特异表达基因的发现及其表达模式问题进行了分析和研究, 进而探讨了肿瘤在基因表达上的特点. 首先, 在分析肿瘤基因表达谱特点的基础上, 提出了基于Relief算法的样本分类特征基因选取策略; 然后, 以支持向量机为分类工具进行样本类型的识别, 以分类错误率为标准选取样本分类特征基因, 并对其中反映肿瘤与正常样本组织构成特点的组织特异表达基因进行排除以突出肿瘤样本真实的类别特征; 最后结合统计学方法, 从信息学的角度论证了分类特征基因在肿瘤组织中特异表达的确实性和普遍性, 并对这些基因在肿瘤组织中呈现出的特异的表达模式进行了分析.  相似文献   

3.
在对候选基因进行排序时,支持向量数据描述(SVDD)可以用来描述各种异构的数据源,如序列数据、学术文献数据、各种生物实验数据等。由于生物实验数据带有噪声,在用SVDD对其描述时,会遇到噪声的影响。本研究通过公式推导扩展了原始的SVDD,提出不确定支持向量数据描述(USVDD),用来降低噪声的影响。利用酵母基因表达数据进行实验,结果表明该方法比标准的SVDD对带噪声的数据具有更好的描述能力。  相似文献   

4.
苏洪全  朱义胜  姜玉梅 《生物信息学》2010,8(4):356-358,363
基因表达系列分析(Serial analysis of gene expression,SAGE)是一种基因表达数据,反映了细胞内的动态变化。模式识别和可视化方法是分析SAGE数据的基本工具,但是由于缺乏描述数据的统计特性,传统的聚类分析技术不适用于SAGE数据的分析。本文提出了一种基于多分类和支持向量机的SAGE数据的分析法。经过对模拟数据和人类癌症SAGE数据的分析,基于径向基核函数的多分类支持向量机算法"一对一"(one-against-one,OAO)算法提供了比PoissonC和PoissonS更好的分类结果。  相似文献   

5.
针对局部线性嵌入算法(LocalLinearEmbedding,LLE)利用试凑法寻找近邻数耗时的缺陷性,提出一种增强的核局部线性嵌入算法(EnhancedKernelLocalLinearEmbedding,EKLLE)自动为样本分配邻域;该算法以高斯核函数为核心改进标准LLE距离度量准则,结合样本的类别信息,无需人工干预自动为样本设置不同的近邻数,克服了试凑法获得最优结果时需要大量时间;最后在各样本近邻数不相同的情况下对数据进行维数简约及待测样本分类。EKLLE算法有效地将高维基因表达谱数据映射到低维本质空间中,解决了传统LLE算法不能很好地处理合噪声或者稀疏数据的缺点。通过对比其他肿瘤样本分类实验,验证本文方法的实时性和精确性。  相似文献   

6.
基因调控网络模型为深入理解生命本质提供了一个新的研究框架和平台。作为基因调控网络模型的其中一种,互信息关联网络模型使用熵和互信息描述基因和基因之间的关联。本文描述了用互信息度量基因表达相似性的方法,提出基于Bootstrap的互信息估计算法,并对产生的偏离现象提出了改进策略。实验结果表明,改进的互信息估计方法可以有效提高基因表达相似性估计的精确度。  相似文献   

7.
毛学刚  魏晶昱 《生态学杂志》2017,28(11):3711-3719
林分类型的识别是森林资源监测的核心问题之一.为研究多源遥感数据协同的面向对象林分类型分类识别,采用Radarsat-2数据和QuickBird遥感影像协同进行面向对象分类.在面向对象分类过程中,采用3种分割方案:单独使用QuickBird遥感影像分割;单独使用Radarsat-2数据分割;Radarsat-2&QuickBird协同分割.3种分割方案均采用10种分割尺度(25~250,步长25),应用修正的欧式距离3指标评价不同分割方案的分割结果,确定最优分割方案及最优分割尺度.在最优分割结果的基础上,基于地形、高度、光谱及共同特征的不同特征组合,应用带有径向基(RBF)核函数的支持向量机(SVM)分类器进行杉木林、马尾松林、阔叶林3种林分类型识别.结果表明:与单独使用一种数据相比,Radarsat-2数据和QuickBird遥感影像协同方案在面向对象林分类型分类方面具有优势.Radarsat-2&QuickBird协同分割方案,以最优尺度参数100进行分割时,分割结果最好.在最优分割结果的基础上,应用两种数据源提取的全部特征进行面向对象林分类型识别的精度最高(总精度为86%,Kappa值为0.86).本研究结果不仅可为多源遥感数据结合进行林分类型识别提供参考和借鉴,而且对于森林资源调查和监测有现实意义.  相似文献   

8.
基于支持向量机和贝叶斯方法的蛋白质四级结构分类研究   总被引:4,自引:2,他引:4  
用支持向量机和贝叶斯两种方法对蛋白质四级结构进行分类研究。结果表明,基于支持向量机的分类结果最好,其l0CV检验的总分类精度、正样本正确预测率、Matthes相关系数和假阳性率分别为74.2%、84.6%、0.474、38.9%;基于贝叶斯的分类结果没有支持向量机的分类结果好,但其l0CV检验的假阳性率最低(15.9%).这些结果说明同源寡聚蛋白质一级序列包含四级结构信息,同时特征向量的确表示了埋藏在缔合亚基作用部位接触表面的基本信息。  相似文献   

9.
将63例II型糖尿病患者以及140例正常人皮肤的自体荧光光谱分为训练集和测试集两类,针对常用的四种核函数,运用交叉验证、网格寻优法计算最优分类参数,然后结合训练集建模并对测试集分类,结果显示使用径向基核函数时分类效果相对最佳。在此基础上,构建了一种基于线性核函数与径向基核函数的混合核函数,该核函数对人体皮肤自体荧光光谱的分类效果较之于径向基核函数更优,其分类正确率为82.61%,敏感性为69.57%,特异性为95.65%。研究结果表明支持向量机可用于人体皮肤自体荧光光谱的分类,有助于提高糖尿病筛查的正确率。  相似文献   

10.
陈磊  刘毅慧 《生物信息学》2011,9(3):229-234
基因芯片技术是基因组学中的重要研究工具。而基因芯片数据( 微阵列数据) 往往是高维的,使得降维成为微阵列数据分析中的一个必要步骤。本文对美国哈佛医学院 G. J. Gordon 等人提供的肺癌微阵列数据进行分析。通过 t- test,Wilcoxon 秩和检测分别提取微阵列数据特征属性,后根据 CART( Classification and Regression Tree) 算法,以 Gini 差异性指标作为误差函数,用提取的特征属性广延的构造分类树; 再进行剪枝找到最优规模的树,目的是提高树的泛化性能使得能很好适应新的预测数据。实验证明: 该方法对肺癌微阵列数据分类识别率达到 96% 以上,且很稳定; 并可以得到人们容易理解的分类规则和分类关键基因。  相似文献   

11.
We propose a new method for tumor classification from gene expression data, which mainly contains three steps. Firstly, the original DNA microarray gene expression data are modeled by independent component analysis (ICA). Secondly, the most discriminant eigenassays extracted by ICA are selected by the sequential floating forward selection technique. Finally, support vector machine is used to classify the modeling data. To show the validity of the proposed method, we applied it to classify three DNA microarray datasets involving various human normal and tumor tissue samples. The experimental results show that the method is efficient and feasible.  相似文献   

12.
Tumor-specific gene expression patterns with gene expression profiles   总被引:1,自引:0,他引:1  
Gene expression profiles of 14 common tumors and their counterpart normal tissues were analyzed with machine learning methods to address the problem of selection of tumor-specific genes and analysis of their differential expressions in tumor tissues. First, a variation of the Relief algorithm, "RFE_Relief algorithm" was proposed to learn the relations between genes and tissue types. Then, a support vector machine was employed to find the gene subset with the best classification performance for distinguishing cancerous tissues and their counterparts. After tissue-specific genes were removed, cross validation experiments were employed to demonstrate the common deregulated expressions of the selected gene in tumor tissues. The results indicate the existence of a specific expression fingerprint of these genes that is shared in different tumor tissues, and the hallmarks of the expression patterns of these genes in cancerous tissues are summarized at the end of this paper.  相似文献   

13.
Gene expression profiles of 14 common tumors and their counterpart normal tissues were analyzed with machine learning methods to address the problem of selection of tumor-specific genes and analysis of their differential expressions in tumor tissues. First, a variation of the Relief algorithm, “RFE_Relief algorithm” was proposed to learn the relations between genes and tissue types. Then, a support vector machine was employed to find the gene subset with the best classification performance for distinguishing cancerous tissues and their counterparts. After tissue-specific genes were removed, cross validation experiments were employed to demonstrate the common deregulated expressions of the selected gene in tumor tissues. The results indicate the existence of a specific expression fingerprint of these genes that is shared in different tumor tissues, and the hallmarks of the expression patterns of these genes in cancerous tissues are summarized at the end of this paper.  相似文献   

14.
k-均值聚类算法是一种广泛应用于基因表达数据聚类分析中的迭代变换算法,它通常用距离法来表示基因间的关系,但不能有效的反应基因间的相互依赖的关系。为此,提出基于信息论的k-modes聚类算法,克服了以上缺点。另外,还引入了伪F统计量,一方面,可以对空间中有部分重叠的点进行有效的分类;另一方面,可以给出最佳聚类数目,从而弥补了k-modes聚类法的不足。使其成为一种非常有效的算法,从而达到较优的聚类效果。  相似文献   

15.
The advent of DNA microarray technology has offered the promise of casting new insights onto deciphering secrets of life by monitoring activities of thousands of genes simultaneously. Current analyses of microarray data focus on precise classification of biological types, for example, tumor versus normal tissues. A further scientific challenging task is to extract disease-relevant genes from the bewildering amounts of raw data, which is one of the most critical themes in the post-genomic era, but it is generally ignored due to lack of an efficient approach. In this paper, we present a novel ensemble method for gene extraction that can be tailored to fulfill multiple biological tasks including (i) precise classification of biological types; (ii) disease gene mining; and (iii) target-driven gene networking. We also give a numerical application for(i) and (ii) using a public microarrary data set and set aside a separate paper to address (iii).  相似文献   

16.
An ensemble method for gene discovery based on DNA microarray data   总被引:9,自引:0,他引:9  
DNA microarrays are now able to measure the expressions of thousands of genes simultaneously. These measurements or gene profiling provides a snapshot?of life that maps to a cross section of ge-netic activities in a four-dimension space of time and the biological entity. Although recent microarray ex-periments[1, 2] hold the promise of the innovative tech-nology to cast new insights onto discovery of secrets of life, development of powerful and efficient analysis strategies for microarray dat…  相似文献   

17.
The NetAcet method has been developed to make predictions of N-terminal acetylation sites, but more information of the data set could be utilized to improve the performance of the model. By employing a new way to extract patterns from sequences and using a sample balancing mechanism, we obtained a correlation coefficient of 0.85, and a sensitivity of 93% on an independent mammalian data set. A web server utilizing this method has been constructed and is available at http://166.111.24.5/acetylation.html.  相似文献   

18.
19.
Personalized medicine implies that distinct treatment methods are prescribed to individual patients according several features that may be obtained from, e.g., gene expression profile. The majority of machine learning methods suffer from the deficiency of preceding cases, i.e. the gene expression data on patients combined with the confirmed outcome of known treatment methods. At the same time, there exist thousands of various cell lines that were treated with hundreds of anti-cancer drugs in order to check the ability of these drugs to stop the cell proliferation, and all these cell line cultures were profiled in terms of their gene expression.

Here we present a new approach in machine learning, which can predict clinical efficiency of anti-cancer drugs for individual patients by transferring features obtained from the expression-based data from cell lines. The method was validated on three datasets for cancer-like diseases (chronic myeloid leukemia, as well as lung adenocarcinoma and renal carcinoma) treated with targeted drugs – kinase inhibitors, such as imatinib or sorafenib.  相似文献   


20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号