首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
黄伟  尹京苑 《生物信息学》2009,7(4):243-247
根据肿瘤分类检测模型的特点,提出了一种新的算法,该算法结合使用了基因选择和数据抽取的有效方法,并在此基础上使用支持向量机对基因表达数据进行分类或者检测。其中乳腺癌的分类交叉验证结果由88.46%提高到100.0%,急性白血病的也由71.05%提高至100.0%。实验结果说明了这一方法的有效性,为在大量的基因表达数据中提高检测癌症的准确性提出了一种比较通用的方法。  相似文献   

2.
基于支持向量机的~(31)P磁共振波谱肝细胞癌诊断   总被引:1,自引:1,他引:0  
支持向量机是在统计学习理论基础上发展起来的一种新的机器学习方法,在模式识别领域有着广泛的应用。利用基于支持向量机模型的31P磁共振波谱数据对肝脏进行分类,区别肝细胞癌,肝硬化和正常的肝组织。通过对基于多项式核函数和径向基核函数的支持向量机分类器进行比较,并且得到三种肝脏分类的识别率。实验表明基于31P磁共振波谱数据的支持向量机分类模型能够对活体肝脏进行诊断性的预测。  相似文献   

3.
基于肿瘤基因表达谱的肿瘤分类是生物信息学的一个重要研究内容。传统的肿瘤信息特征提取方法大多基于信息基因选择方法,但是在筛选基因时,不可避免的会造成分类信息的流失。提出了一种基于邻接矩阵分解的肿瘤亚型特征提取方法,首先对肿瘤基因表达谱数据构造高斯权邻接矩阵,接着对邻接矩阵进行奇异值分解,最后将分解得到的正交矩阵特征行向量作为分类特征输入支持向量机进行分类识别。采用留一法对白血病两个亚型的基因表达谱数据集进行实验,实验结果证明了该方法的可行性和有效性。  相似文献   

4.
文章研究了基于微阵列基因表达数据的胃癌亚型分类。微阵列基因表达数据样本少、纬度高、噪声大的特点,使得数据降维成为分类成功的关键。作者将主成分分析(PCA) 和偏最小二乘(PLS)两种降维方法应用于胃癌亚型分类研究,以支持向量机(SVM)、K- 近邻法(KNN)为分类器对两套胃癌数据进行亚型分类。分类效果相比传统的医理诊断略高,最高准确率可达100%。研究结果表明,主成分分析和偏最小二乘方法能够有效地提取分类特征信息,并能在保持较高的分类准确率的前提下大幅度地降低基因表达数据的维数。  相似文献   

5.
由于基因表达数据高属性维、低样本维的特点,Fisher分类器对该种数据分类性能不是很高。本文提出了Fisher的改进算法Fisher-List。该算法独特之处在于为每个类别确定一个决策阀值,每个阀值既包含总体样本信息,又含有某些对分类至关重要的个体样本信息。本文用实验证明新算法在基因表达数据分类方面比Fisher、LogitBoost、AdaBoost、k-近邻法、决策树和支持向量机具有更高的性能。  相似文献   

6.
目的 不同患者对同一抗癌药物的反应可能不同,了解患者之间对抗癌药物的反应差异对癌症精准医疗具有重大参考价值。方法 高通量测序数据为构建抗癌药物反应分类预测模型提供了强大的数据支撑。针对两大经典数据集癌症细胞百科全书(CCLE)和癌症药物敏感性基因组学数据集(GDSC),本文提出了基于最大相关最小冗余(mRMR)算法和支持向量机(SVM)的计算模型mRMR-SVM。利用基因表达数据,通过方差排序和mRMR算法提取特征基因,借助SVM实现抗癌药物对细胞系的“敏感-抑制”二分类预测。结果 对于CCLE中的22种药物,mRMR-SVM的平均准确率为0.904;对于GDSC中的11种药物,平均准确率为0.851。结论 mRMR-SVM不仅在预测性能方面优于传统的支持向量机、随机森林、深度反应森林、深度神经网络和细胞系-药物复杂网络模型,而且具有良好的泛化能力,对于三类特定组织的抗癌药物反应分类预测也取得了令人满意的结果。此外,mRMR-SVM可以识别与癌症发生发展密切相关的生物标志物。  相似文献   

7.
基于SVM和平均影响值的人肿瘤信息基因提取   总被引:1,自引:0,他引:1       下载免费PDF全文
基于基因表达谱的肿瘤分类信息基因选取是发现肿瘤特异表达基因、探索肿瘤基因表达模式的重要手段。借助由基因表达谱获得的分类信息进行肿瘤诊断是当今生物信息学领域中的一个重要研究方向,有望成为临床医学上一种快速而有效的肿瘤分子诊断方法。鉴于肿瘤基因表达谱样本数据维数高、样本量小以及噪音大等特点,提出一种结合支持向量机应用平均影响值来寻找肿瘤信息基因的算法,其优点是能够搜索到基因数量尽可能少而分类能力尽可能强的多个信息基因子集。采用二分类肿瘤数据集验证算法的可行性和有效性,对于结肠癌样本集,只需3个基因就能获得100%的留一法交叉验证识别准确率。为避免样本集的不同划分对分类性能的影响,进一步采用全折交叉验证方法来评估各信息基因子集的分类性能,优选出更可靠的信息基因子集。与基它肿瘤分类方法相比,实验结果在信息基因数量以及分类性能方面具有明显的优势。  相似文献   

8.
建立了基于小波降噪和支持向量机的结肠癌基因表达数据肿瘤识别模型.对试验数据进行小波分解,并利用交叉验证的方法计算试验样本的平均分类准确率,确定小波函数与小波分解层数;引入能量阈值方法对小波分解系数进行阈值处理,达到降噪的目的;提出了基因分类贡献率与主成分分析结合的方法,提取结肠癌样本数据特征;利用支持向量机强大的非线性映射能力,实现对结肠癌样本数据的非线性分类.为了减弱样本集的划分对分类准确率的影响,本文采取Jackknife检验方法对支持向量分类器的分类器检验,其分类准确率为96.77%.试验结果证明了该方法的有效性,该方法对结肠癌的识别具有一定的参考价值.  相似文献   

9.
基于支持向量机的蛋白质同源寡聚体分类研究   总被引:14,自引:1,他引:13  
基于支持向量机和贝叶斯方法,从蛋白质一级序列出发对蛋白质同源二聚体、同源三聚体、同源四聚体、同源六聚体进行分类研究,结果表明:基于支持向量机, 采用“一对多”和“一对一”策略, 其分类总精度分别为77.36%和93.43%, 分别比基于贝叶斯协方差判别法的分类总精度50.64%提高26.72和42.79个百分点.从而说明支持向量机可用于蛋白质同源寡聚体分类,且是一种非常有效的方法.对于多类蛋白质同源寡聚体分类,基于相同的机器学习方法(如支持向量机),采用“一对一”策略比“一对多”效果好.同时亦表明蛋白质同源寡聚体一级序列包含四级结构信息.  相似文献   

10.
基于Fiedler向量的基因表达谱数据分类方法   总被引:1,自引:0,他引:1  
尝试将一种基于图的Fiedler向量的聚类算法引入到基因表达谱数据的肿瘤分类中来。该方法将分属不同类的所有样本通过高斯权构造Laplace完全图,经SVD分解后获得Fiedler向量,利用各样本所对应的Fiedler向量分量的符号差异来进行基因表达谱数据的分类。通过模拟数据仿真实验和对白血病两个亚型(ALL与AML)及结肠癌真实数据实验,证明了这一方法的有效性。  相似文献   

11.
12.

Background

Serial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives.

Principal Findings

Here we explore the use of seriation, a statistical approach for ordering sets of objects based on their similarity, for large-scale expression pattern discovery in SAGE data. For this specific task we implement a seriation heuristic we term ‘progressive construction of contigs’ that constructs local chains of related elements by sequentially rearranging margins of the correlation matrix. We apply the heuristic to the analysis of simulated and experimental SAGE data and compare our results to those obtained with a clustering algorithm developed specifically for SAGE data. We show using simulations that the performance of seriation compares favorably to that of the clustering algorithm on noisy SAGE data.

Conclusions

We explore the use of a seriation approach for visualization-based pattern discovery in SAGE data. Using both simulations and experimental data, we demonstrate that seriation is able to identify groups of co-expressed genes more accurately than a clustering algorithm developed specifically for SAGE data. Our results suggest that seriation is a useful method for the analysis of gene expression data whose applicability should be further pursued.  相似文献   

13.
Clustering analysis of SAGE data using a Poisson approach   总被引:3,自引:1,他引:2       下载免费PDF全文
Serial analysis of gene expression (SAGE) data have been poorly exploited by clustering analysis owing to the lack of appropriate statistical methods that consider their specific properties. We modeled SAGE data by Poisson statistics and developed two Poisson-based distances. Their application to simulated and experimental mouse retina data show that the Poisson-based distances are more appropriate and reliable for analyzing SAGE data compared to other commonly used distances or similarity measures such as Pearson correlation or Euclidean distance.  相似文献   

14.
15.
癌症基因表达谱挖掘中的特征基因选择算法GA/WV   总被引:1,自引:0,他引:1  
鉴定癌症表达谱的特征基因集合可以促进癌症类型分类的研究,这也可能使病人获得更好的临床诊断?虽然一些方法在基因表达谱分析上取得了成功,但是用基因表达谱数据进行癌症分类研究依然是一个巨大的挑战,其主要原因在于缺少通用而可靠的基因重要性评估方法。GA/WV是一种新的用复杂的生物表达数据评估基因分类重要性的方法,通过联合遗传算法(GA)和加权投票分类算法(WV)得到的特征基因集合不但适用于WV分类器,也适用于其它分类器?将GA/WV方法用癌症基因表达谱数据集的验证,结果表明本方法是一种成功可靠的特征基因选择方法。  相似文献   

16.
MOTIVATION: DNA microarrays have revolutionized biological research, but their reliability and accuracy have not been extensively evaluated. Thorough testing of microarrays through comparison to dissimilar gene expression methods is necessary in order to determine their accuracy. RESULTS: We have systematically compared three global gene expression methods on all available histologically normal samples from five human organ types. The data included 25 Affymetrix high-density oligonucleotide array experiments, 23 expressed sequence tag based expression (EBE) experiments and 5 SAGE experiments. The reported gene-by-gene expression patterns showed a wide range of correlations between pairs of methods. This level of agreement was sufficient for accurate clustering of datasets from the same tissue and dissimilar methods, but highlights the need for thorough validation of individual gene expression measurements by alternate, non-global methods. Furthermore, analyses of mRNA abundance distributions indicate limitations in the EBE and SAGE methods at both high- and low-expression levels.  相似文献   

17.
18.
MOTIVATION: SAGE enables the determination of genome-wide mRNA expression profiles. A comprehensive analysis of SAGE data requires software, which integrates (statistical) data analysis methods with a database system. Furthermore, to facilitate data sharing between users, the application should reside on a central server and be accessed via the internet. Since such an application was not available we developed the USAGE package. RESULTS: USAGE is a web-based application that comprises an integrated set of tools, which offers many functions for analysing and comparing SAGE data. Additionally, USAGE includes a statistical method for the planning of new SAGE experiments. USAGE is available in a multi-user environment giving users the option of sharing data. USAGE is interfaced to a relational database to store data and analysis results. The USAGE query editor allows the composition of queries for searching this database. Several database functions have been included which enable the selection and combination of data. USAGE provides the biologist increased functionality and flexibility for analysing SAGE data. AVAILABILITY: USAGE is freely accessible for academic institutions at http://www.cmbi.kun.nl/usage/. The source code of USAGE is freely available for academic institutions on request from the first author.  相似文献   

19.
Serial analysis of gene expression (SAGE) is a powerful quantification technique for gene expression data. The huge amount of tag data in SAGE libraries of samples is difficult to analyze with current SAGE analysis tools. Data is often not provided in a biologically significant way for cross‐analysis and ‐comparison, thus limiting its application. Hence, an integrated software platform that can perform such a complex task is required. Here, we implement set theory for cross‐analyzing gene expression data among different SAGE libraries of tissue sources; up‐ or down‐regulated tissue‐specific tags can be identified computationally. Extract‐SAGE employs a genetic algorithm (GA) to reduce the number of genes among the SAGE libraries. Its representative tag mining will facilitate the discovery of the candidate genes with discriminating gene expression.  相似文献   

20.
Ensemble clustering methods have become increasingly important to ease the task of choosing the most appropriate cluster algorithm for a particular data analysis problem. The consensus clustering (CC) algorithm is a recognized ensemble clustering method that uses an artificial intelligence technique to optimize a fitness function. We formally prove the existence of a subspace of the search space for CC, which contains all solutions of maximal fitness and suggests two greedy algorithms to search this subspace. We evaluate the algorithms on two gene expression data sets and one synthetic data set, and compare the result with the results of other ensemble clustering approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号