期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Ischemia episode detection in ECG using kernel density estimation, support vector machine and feature selection

Park J Pedrycz W Jeon M 《Biomedical engineering online》2012,11(1):30-22

ABSTRACT: BACKGROUND: Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome inelectrocardiogram (ECG) more accurately and automatically can prevent it from developing into a catastrophicdisease. To this end, we propose a new method, which employs wavelets and simple feature selection. METHODS: For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method basedon the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used fordifferentiating ST episodes from normal: 1) the area between QRS offset and T-peak points, 2) the normalizedand signed sum from QRS offset to effective zero voltage point, and 3) the slope from QRS onset to offset point.We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiersto those features. RESULTS: We evaluated the algorithm by kernel density estimation (KDE) and support vector machine (SVM) methods.Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemicST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively.The SVM classifier detects 355 ischemic ST episodes. CONCLUSIONS: We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removingbaseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and featureextraction from morphology of ECG waveforms explicitly. It was shown that the number of selected featureswere sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposedKDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require anynumerical values of the parameters to be supplied in advance. In the case of the SVM classifier, one has to selecta single parameter. 相似文献

2.

ESVM: evolutionary support vector machine for automatic feature selection and classification of microarray data 总被引：1，自引：0，他引：1

Huang HL Chang FL 《Bio Systems》2007,90(2):516-528

An optimal design of support vector machine (SVM)-based classifiers for prediction aims to optimize the combination of feature selection, parameter setting of SVM, and cross-validation methods. However, SVMs do not offer the mechanism of automatic internal relevant feature detection. The appropriate setting of their control parameters is often treated as another independent problem. This paper proposes an evolutionary approach to designing an SVM-based classifier (named ESVM) by simultaneous optimization of automatic feature selection and parameter tuning using an intelligent genetic algorithm, combined with k-fold cross-validation regarded as an estimator of generalization ability. To illustrate and evaluate the efficiency of ESVM, a typical application to microarray classification using 11 multi-class datasets is adopted. By considering model uncertainty, a frequency-based technique by voting on multiple sets of potentially informative features is used to identify the most effective subset of genes. It is shown that ESVM can obtain a high accuracy of 96.88% with a small number 10.0 of selected genes using 10-fold cross-validation for the 11 datasets averagely. The merits of ESVM are three-fold: (1) automatic feature selection and parameter setting embedded into ESVM can advance prediction abilities, compared to traditional SVMs; (2) ESVM can serve not only as an accurate classifier but also as an adaptive feature extractor; (3) ESVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of ESVM for bioinformatics problems. 相似文献

3.

Stable feature selection based on the ensemble L 1 -norm support vector machine for biomarker discovery

Moon Myungjin Nakai Kenta 《BMC genomics》2016,17(13):65-74

Background

Lately, biomarker discovery has become one of the most significant research issues in the biomedical field. Owing to the presence of high-throughput technologies, genomic data, such as microarray data and RNA-seq, have become widely available. Many kinds of feature selection techniques have been applied to retrieve significant biomarkers from these kinds of data. However, they tend to be noisy with high-dimensional features and consist of a small number of samples; thus, conventional feature selection approaches might be problematic in terms of reproducibility.

Results

In this article, we propose a stable feature selection method for high-dimensional datasets. We apply an ensemble L ₁-norm support vector machine to efficiently reduce irrelevant features, considering the stability of features. We define the stability score for each feature by aggregating the ensemble results, and utilize backward feature elimination on a purified feature set based on this score; therefore, it is possible to acquire an optimal set of features for performance without the need to set a specific threshold. The proposed methodology is evaluated by classifying the binary stage of renal clear cell carcinoma with RNA-seq data.

Conclusion

A comparison with established algorithms, i.e., a fast correlation-based filter, random forest, and an ensemble version of an L ₂-norm support vector machine-based recursive feature elimination, enabled us to prove the superior performance of our method in terms of classification as well as stability in general. It is also shown that the proposed approach performs moderately on high-dimensional datasets consisting of a very large number of features and a smaller number of samples. The proposed approach is expected to be applicable to many other researches aimed at biomarker discovery.

相似文献

4.

Antepartum fetal heart rate feature extraction and classification using empirical mode decomposition and support vector machine

Niranjana Krupa Mohd Ali MA Edmond Zahedi Shuhaila Ahmed Fauziah M Hassan 《Biomedical engineering online》2011,10(1):6

Background

Cardiotocography (CTG) is the most widely used tool for fetal surveillance. The visual analysis of fetal heart rate (FHR) traces largely depends on the expertise and experience of the clinician involved. Several approaches have been proposed for the effective interpretation of FHR. In this paper, a new approach for FHR feature extraction based on empirical mode decomposition (EMD) is proposed, which was used along with support vector machine (SVM) for the classification of FHR recordings as 'normal' or 'at risk'. 相似文献

5.

Identification of the functional alteration signatures across different cancer types with support vector machine and feature analysis

ShaoPeng Wang YuDong Cai 《生物化学与生物物理学报:疾病的分子基础》2018,1864(6):2218-2227

Cancers are regarded as malignant proliferations of tumor cells present in many tissues and organs, which can severely curtail the quality of human life. The potential of using plasma DNA for cancer detection has been widely recognized, leading to the need of mapping the tissue-of-origin through the identification of somatic mutations. With cutting-edge technologies, such as next-generation sequencing, numerous somatic mutations have been identified, and the mutation signatures have been uncovered across different cancer types. However, somatic mutations are not independent events in carcinogenesis but exert functional effects. In this study, we applied a pan-cancer analysis to five types of cancers: (I) breast cancer (BRCA), (II) colorectal adenocarcinoma (COADREAD), (III) head and neck squamous cell carcinoma (HNSC), (IV) kidney renal clear cell carcinoma (KIRC), and (V) ovarian cancer (OV). Based on the mutated genes of patients suffering from one of the aforementioned cancer types, patients they were encoded into a large number of numerical values based upon the enrichment theory of gene ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. We analyzed these features with the Monte-Carlo Feature Selection (MCFS) method, followed by the incremental feature selection (IFS) method to identify functional alteration features that could be used to build the support vector machine (SVM)-based classifier for distinguishing the five types of cancers. Our results showed that the optimal classifier with the selected 344 features had the highest Matthews correlation coefficient value of 0.523. Sixteen decision rules produced by the MCFS method can yield an overall accuracy of 0.498 for the classification of the five cancer types. Further analysis indicated that some of these features and rules were supported by previous experiments. This study not only presents a new approach to mapping the tissue-of-origin for cancer detection but also unveils the specific functional alterations of each cancer type, providing insight into cancer-specific functional aberrations as potential therapeutic targets. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang. 相似文献

6.

A new regularized least squares support vector regression for gene selection

Pei-Chun Chen Su-Yun Huang Wei J Chen Chuhsing K Hsiao 《BMC bioinformatics》2009,10(1):44

Background

Selection of influential genes with microarray data often faces the difficulties of a large number of genes and a relatively small group of subjects. In addition to the curse of dimensionality, many gene selection methods weight the contribution from each individual subject equally. This equal-contribution assumption cannot account for the possible dependence among subjects who associate similarly to the disease, and may restrict the selection of influential genes. 相似文献

7.

Prediction of prostate cancer using hair trace element concentration and support vector machine method

Guo J Deng W Zhang L Li C Wu P Mao P 《Biological trace element research》2007,116(3):257-271

A change in the normal concentration of essential trace elements in the human body might lead to major health disturbances. In this study, hair samples were collected from 115 human subject, including 55 healthy people and 60 patients with prostate cancer. The concentrations of 20 trace elements (TEs) in these samples were measured by inductively coupled plasma-mass spectrometry. A support vector machine was used to investigate the relationship between TEs and prostate cancer. It is found that, among the 20 TEs, 10 (Mg P, K, Ca, Cr, Mn, Fe. Cu, Zn, and Se) are related to the risk of prostate cancer. These 10 TEs were used to build the prediction model for prostate cancer. The model obtained can satisfactorily distinguish the healthy samples from the cancer samples. Furthermore, the cross-validation by leaving-one method proved that the prediction ability of this model reaches as high as 95.8%. It is practical to predict the risk of prostate cancer using this model in the clinics 相似文献

8.

Recognition and classification of histones using support vector machine.

Manoj Bhasin Ellis L Reinherz Pedro A Reche 《Journal of computational biology》2006,13(1):102-112

Histones are DNA-binding proteins found in the chromatin of all eukaryotic cells. They are highly conserved and can be grouped into five major classes: H1/H5, H2A, H2B, H3, and H4. Two copies of H2A, H2B, H3, and H4 bind to about 160 base pairs of DNA forming the core of the nucleosome (the repeating structure of chromatin) and H1/H5 bind to its DNA linker sequence. Overall, histones have a high arginine/lysine content that is optimal for interaction with DNA. This sequence bias can make the classification of histones difficult using standard sequence similarity approaches. Therefore, in this paper, we applied support vector machine (SVM) to recognize and classify histones on the basis of their amino acid and dipeptide composition. On evaluation through a five-fold cross-validation, the SVM-based method was able to distinguish histones from nonhistones (nuclear proteins) with an accuracy around 98%. Similarly, we obtained an overall >95% accuracy in discriminating the five classes of histones through the application of 1-versus-rest (1-v-r) SVM. Finally, we have applied this SVM-based method to the detection of histones from whole proteomes and found a comparable sensitivity to that accomplished by hidden Markov motifs (HMM) profiles. 相似文献

9.

Improved method for predicting beta-turn using support vector machine 总被引：2，自引：0，他引：2

Zhang Q Yoon S Welsh WJ 《Bioinformatics (Oxford, England)》2005,21(10):2370-2374

MOTIVATION: Numerous methods for predicting beta-turns in proteins have been developed based on various computational schemes. Here, we introduce a new method of beta-turn prediction that uses the support vector machine (SVM) algorithm together with predicted secondary structure information. Various parameters from the SVM have been adjusted to achieve optimal prediction performance. RESULTS: The SVM method achieved excellent performance as measured by the Matthews correlation coefficient (MCC = 0.45) using a 7-fold cross validation on a database of 426 non-homologous protein chains. To our best knowledge, this MCC value is the highest achieved so far for predicting beta-turn. The overall prediction accuracy Qtotal was 77.3%, which is the best among the existing prediction methods. Among its unique attractive features, the present SVM method avoids overtraining and compresses information and provides a predicted reliability index. 相似文献

10.

Ecological footprint model using the support vector machine technique

Ma H Chang W Cui G 《PloS one》2012,7(1):e30396

The per capita ecological footprint (EF) is one of the most widely recognized measures of environmental sustainability. It aims to quantify the Earth's biological resources required to support human activity. In this paper, we summarize relevant previous literature, and present five factors that influence per capita EF. These factors are: National gross domestic product (GDP), urbanization (independent of economic development), distribution of income (measured by the Gini coefficient), export dependence (measured by the percentage of exports to total GDP), and service intensity (measured by the percentage of service to total GDP). A new ecological footprint model based on a support vector machine (SVM), which is a machine-learning method based on the structural risk minimization principle from statistical learning theory was conducted to calculate the per capita EF of 24 nations using data from 123 nations. The calculation accuracy was measured by average absolute error and average relative error. They were 0.004883 and 0.351078% respectively. Our results demonstrate that the EF model based on SVM has good calculation performance. 相似文献

11.

Normalization and integration of large-scale metabolomics data using support vector regression 总被引：1，自引：0，他引：1

Xiaotao Shen Xiaoyun Gong Yuping Cai Yuan Guo Jia Tu Hao Li Tao Zhang Jialin Wang Fuzhong Xue Zheng-Jiang Zhu 《Metabolomics : Official journal of the Metabolomic Society》2016,12(5):89

Introduction

Untargeted metabolomics studies for biomarker discovery often have hundreds to thousands of human samples. Data acquisition of large-scale samples has to be divided into several batches and may span from months to as long as several years. The signal drift of metabolites during data acquisition (intra- and inter-batch) is unavoidable and is a major confounding factor for large-scale metabolomics studies.

Objectives

We aim to develop a data normalization method to reduce unwanted variations and integrate multiple batches in large-scale metabolomics studies prior to statistical analyses.

Methods

We developed a machine learning algorithm-based method, support vector regression (SVR), for large-scale metabolomics data normalization and integration. An R package named MetNormalizer was developed and provided for data processing using SVR normalization.

Results

After SVR normalization, the portion of metabolite ion peaks with relative standard deviations (RSDs) less than 30 % increased to more than 90 % of the total peaks, which is much better than other common normalization methods. The reduction of unwanted analytical variations helps to improve the performance of multivariate statistical analyses, both unsupervised and supervised, in terms of classification and prediction accuracy so that subtle metabolic changes in epidemiological studies can be detected.

Conclusion

SVR normalization can effectively remove the unwanted intra- and inter-batch variations, and is much better than other common normalization methods.

相似文献

12.

Gene selection using support vector machines with non-convex penalty 总被引：2，自引：0，他引：2

Zhang HH Ahn J Lin X Park C 《Bioinformatics (Oxford, England)》2006,22(1):88-95

MOTIVATION: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of 'high-dimensional low sample size'. Therefore, robust and accurate gene selection methods are required to identify differentially expressed group of genes across different samples, e.g. between cancerous and normal cells. Successful gene selection will help to classify different cancer types, lead to a better understanding of genetic signatures in cancers and improve treatment strategies. Although gene selection and cancer classification are two closely related problems, most existing approaches handle them separately by selecting genes prior to classification. We provide a unified procedure for simultaneous gene selection and cancer classification, achieving high accuracy in both aspects. RESULTS: In this paper we develop a novel type of regularization in support vector machines (SVMs) to identify important genes for cancer classification. A special nonconvex penalty, called the smoothly clipped absolute deviation penalty, is imposed on the hinge loss function in the SVM. By systematically thresholding small estimates to zeros, the new procedure eliminates redundant genes automatically and yields a compact and accurate classifier. A successive quadratic algorithm is proposed to convert the non-differentiable and non-convex optimization problem into easily solved linear equation systems. The method is applied to two real datasets and has produced very promising results. AVAILABILITY: MATLAB codes are available upon request from the authors. 相似文献

13.

Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

Alvaro J González Li Liao 《BMC bioinformatics》2010,11(1):537

Background

Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. 相似文献

14.

Prediction of piRNAs using transposon interaction and a support vector machine

Kai Wang Chun Liang Jinding Liu Huamei Xiao Shuiqing Huang Jianhua Xu Fei Li 《BMC bioinformatics》2014,15(1)

相似文献

15.

Accurate identification of alternatively spliced exons using support vector machine 总被引：6，自引：0，他引：6

Dror G Sorek R Shamir R 《Bioinformatics (Oxford, England)》2005,21(7):897-901

相似文献

16.

Identifying translation initiation sites in prokaryotes using support vector machine

Tingting Gao Yong Wang 《Journal of theoretical biology》2010,262(4):644-8164

相似文献

17.

GISMO--gene identification using a support vector machine for ORF classification

下载免费PDF全文

Krause L McHardy AC Nattkemper TW Pühler A Stoye J Meyer F 《Nucleic acids research》2007,35(2):540-549

We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license. 相似文献

18.

Prediction of bioactivity of ACAT2 inhibitors by multilinear regression analysis and support vector machine

Min Zhong Shouyi Xuan Ling Wang Xiaoli Hou Maolin Wang Aixia Yan Bin Dai 《Bioorganic & medicinal chemistry letters》2013,23(13):3788-3792

相似文献

19.

Gene selection algorithms for microarray data based on least squares support vector machine 总被引：1，自引：0，他引：1

E Ke Tang PN Suganthan Xin Yao 《BMC bioinformatics》2006,7(1):95-16

Background

In discriminant analysis of microarray data, usually a small number of samples are expressed by a large number of genes. It is not only difficult but also unnecessary to conduct the discriminant analysis with all the genes. Hence, gene selection is usually performed to select important genes. 相似文献

20.

Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models

Wen Liu Xiangshan Meng Qiqi Xu Darren R Flower Tongbin Li 《BMC bioinformatics》2006,7(1):182-13

Background

The binding between peptide epitopes and major histocompatibility complex proteins (MHCs) is an important event in the cellular immune response. Accurate prediction of the binding between short peptides and the MHC molecules has long been a principal challenge for immunoinformatics. Recently, the modeling of MHC-peptide binding has come to emphasize quantitative predictions: instead of categorizing peptides as "binders" or "non-binders" or as "strong binders" and "weak binders", recent methods seek to make predictions about precise binding affinities. 相似文献