共查询到4条相似文献,搜索用时 0 毫秒
1.
MicroRNAs (miRNAs) are one family of short (21-23 nt) regulatory non-coding RNAs processed from long (70-110 nt) miRNA precursors (pre-miRNAs). Identifying true and false precursors plays an important role in computational identification of miRNAs. Some numerical features have been extracted from precursor sequences and their secondary structures to suit some classification methods; however, they may lose some usefully discriminative information hidden in sequences and structures. In this study, pre-miRNA sequences and their secondary structures are directly used to construct an exponential kernel based on weighted Levenshtein distance between two sequences. This string kernel is then combined with support vector machine (SVM) for detecting true and false pre-miRNAs. Based on 331 training samples of true and false human pre-miRNAs, 2 key parameters in SVM are selected by 5-fold cross validation and grid search, and 5 realizations with different 5-fold partitions are executed. Among 16 independent test sets from 3 human, 8 animal, 2 plant, 1 virus, and 2 artificially false human pre-miRNAs, our method statistically outperforms the previous SVM-based technique on 11 sets, including 3 human, 7 animal, and 1 false human pre-miRNAs. In particular, premiRNAs with multiple loops that were usually excluded in the previous work are correctly identified in this study with an accuracy of 92.66%. 相似文献
2.
基因表达系列分析(Serial analysis of gene expression,SAGE)是一种基因表达数据,反映了细胞内的动态变化。模式识别和可视化方法是分析SAGE数据的基本工具,但是由于缺乏描述数据的统计特性,传统的聚类分析技术不适用于SAGE数据的分析。本文提出了一种基于多分类和支持向量机的SAGE数据的分析法。经过对模拟数据和人类癌症SAGE数据的分析,基于径向基核函数的多分类支持向量机算法一对一(one-against-one,OAO)算法提供了比PoissonC和PoissonS更好的分类结果。 相似文献
3.
Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines 总被引:11,自引:0,他引:11
Simultaneous multiclass classification of tumor types is essential for future clinical implementations of microarray-based cancer diagnosis. In this study, we have combined genetic algorithms (GAs) and all paired support vector machines (SVMs) for multiclass cancer identification. The predictive features have been selected through iterative SVMs/GAs, and recursive feature elimination post-processing steps, leading to a very compact cancer-related predictive gene set. Leave-one-out cross-validations yielded accuracies of 87.93% for the eight-class and 85.19% for the fourteen-class cancer classifications, outperforming the results derived from previously published methods. 相似文献
4.
Small interfering RNAs (siRNAs) are becoming widely used for sequence-specific gene silencing in mammalian cells, but designing an effective siRNA is still a challenging task. In this study, we developed an algorithm for predicting siRNA functionality by using generalized string kernel (GSK) combined with support vector machine (SVM). With GSK, siRNA sequences were represented as vectors in a multi-dimensional feature space according to the numbers of subsequences in each siRNA, and subsequently classified with SVM into effective or ineffective siRNAs. We applied this algorithm to published siRNAs, and could classify effective and ineffective siRNAs with 90.6%, 86.2% accuracy, respectively. 相似文献