共查询到20条相似文献,搜索用时 0 毫秒
1.
ABSTRACT: BACKGROUND: Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome inelectrocardiogram (ECG) more accurately and automatically can prevent it from developing into a catastrophicdisease. To this end, we propose a new method, which employs wavelets and simple feature selection. METHODS: For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method basedon the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used fordifferentiating ST episodes from normal: 1) the area between QRS offset and T-peak points, 2) the normalizedand signed sum from QRS offset to effective zero voltage point, and 3) the slope from QRS onset to offset point.We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiersto those features. RESULTS: We evaluated the algorithm by kernel density estimation (KDE) and support vector machine (SVM) methods.Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemicST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively.The SVM classifier detects 355 ischemic ST episodes. CONCLUSIONS: We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removingbaseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and featureextraction from morphology of ECG waveforms explicitly. It was shown that the number of selected featureswere sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposedKDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require anynumerical values of the parameters to be supplied in advance. In the case of the SVM classifier, one has to selecta single parameter. 相似文献
2.
An optimal design of support vector machine (SVM)-based classifiers for prediction aims to optimize the combination of feature selection, parameter setting of SVM, and cross-validation methods. However, SVMs do not offer the mechanism of automatic internal relevant feature detection. The appropriate setting of their control parameters is often treated as another independent problem. This paper proposes an evolutionary approach to designing an SVM-based classifier (named ESVM) by simultaneous optimization of automatic feature selection and parameter tuning using an intelligent genetic algorithm, combined with k-fold cross-validation regarded as an estimator of generalization ability. To illustrate and evaluate the efficiency of ESVM, a typical application to microarray classification using 11 multi-class datasets is adopted. By considering model uncertainty, a frequency-based technique by voting on multiple sets of potentially informative features is used to identify the most effective subset of genes. It is shown that ESVM can obtain a high accuracy of 96.88% with a small number 10.0 of selected genes using 10-fold cross-validation for the 11 datasets averagely. The merits of ESVM are three-fold: (1) automatic feature selection and parameter setting embedded into ESVM can advance prediction abilities, compared to traditional SVMs; (2) ESVM can serve not only as an accurate classifier but also as an adaptive feature extractor; (3) ESVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of ESVM for bioinformatics problems. 相似文献
3.
BackgroundLately, biomarker discovery has become one of the most significant research issues in the biomedical field. Owing to the presence of high-throughput technologies, genomic data, such as microarray data and RNA-seq, have become widely available. Many kinds of feature selection techniques have been applied to retrieve significant biomarkers from these kinds of data. However, they tend to be noisy with high-dimensional features and consist of a small number of samples; thus, conventional feature selection approaches might be problematic in terms of reproducibility. ResultsIn this article, we propose a stable feature selection method for high-dimensional datasets. We apply an ensemble L
1
-norm support vector machine to efficiently reduce irrelevant features, considering the stability of features. We define the stability score for each feature by aggregating the ensemble results, and utilize backward feature elimination on a purified feature set based on this score; therefore, it is possible to acquire an optimal set of features for performance without the need to set a specific threshold. The proposed methodology is evaluated by classifying the binary stage of renal clear cell carcinoma with RNA-seq data. ConclusionA comparison with established algorithms, i.e., a fast correlation-based filter, random forest, and an ensemble version of an L
2
-norm support vector machine-based recursive feature elimination, enabled us to prove the superior performance of our method in terms of classification as well as stability in general. It is also shown that the proposed approach performs moderately on high-dimensional datasets consisting of a very large number of features and a smaller number of samples. The proposed approach is expected to be applicable to many other researches aimed at biomarker discovery. 相似文献
4.
Background Cardiotocography (CTG) is the most widely used tool for fetal surveillance. The visual analysis of fetal heart rate (FHR) traces largely depends on the expertise and experience of the clinician involved. Several approaches have been proposed for the effective interpretation of FHR. In this paper, a new approach for FHR feature extraction based on empirical mode decomposition (EMD) is proposed, which was used along with support vector machine (SVM) for the classification of FHR recordings as 'normal' or 'at risk'. 相似文献
5.
Cancers are regarded as malignant proliferations of tumor cells present in many tissues and organs, which can severely curtail the quality of human life. The potential of using plasma DNA for cancer detection has been widely recognized, leading to the need of mapping the tissue-of-origin through the identification of somatic mutations. With cutting-edge technologies, such as next-generation sequencing, numerous somatic mutations have been identified, and the mutation signatures have been uncovered across different cancer types. However, somatic mutations are not independent events in carcinogenesis but exert functional effects. In this study, we applied a pan-cancer analysis to five types of cancers: (I) breast cancer (BRCA), (II) colorectal adenocarcinoma (COADREAD), (III) head and neck squamous cell carcinoma (HNSC), (IV) kidney renal clear cell carcinoma (KIRC), and (V) ovarian cancer (OV). Based on the mutated genes of patients suffering from one of the aforementioned cancer types, patients they were encoded into a large number of numerical values based upon the enrichment theory of gene ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. We analyzed these features with the Monte-Carlo Feature Selection (MCFS) method, followed by the incremental feature selection (IFS) method to identify functional alteration features that could be used to build the support vector machine (SVM)-based classifier for distinguishing the five types of cancers. Our results showed that the optimal classifier with the selected 344 features had the highest Matthews correlation coefficient value of 0.523. Sixteen decision rules produced by the MCFS method can yield an overall accuracy of 0.498 for the classification of the five cancer types. Further analysis indicated that some of these features and rules were supported by previous experiments. This study not only presents a new approach to mapping the tissue-of-origin for cancer detection but also unveils the specific functional alterations of each cancer type, providing insight into cancer-specific functional aberrations as potential therapeutic targets. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang. 相似文献
6.
Background Selection of influential genes with microarray data often faces the difficulties of a large number of genes and a relatively
small group of subjects. In addition to the curse of dimensionality, many gene selection methods weight the contribution from
each individual subject equally. This equal-contribution assumption cannot account for the possible dependence among subjects
who associate similarly to the disease, and may restrict the selection of influential genes. 相似文献
7.
A change in the normal concentration of essential trace elements in the human body might lead to major health disturbances.
In this study, hair samples were collected from 115 human subject, including 55 healthy people and 60 patients with prostate
cancer. The concentrations of 20 trace elements (TEs) in these samples were measured by inductively coupled plasma-mass spectrometry.
A support vector machine was used to investigate the relationship between TEs and prostate cancer. It is found that, among
the 20 TEs, 10 (Mg P, K, Ca, Cr, Mn, Fe. Cu, Zn, and Se) are related to the risk of prostate cancer. These 10 TEs were used
to build the prediction model for prostate cancer. The model obtained can satisfactorily distinguish the healthy samples from
the cancer samples. Furthermore, the cross-validation by leaving-one method proved that the prediction ability of this model
reaches as high as 95.8%. It is practical to predict the risk of prostate cancer using this model in the clinics 相似文献
8.
Histones are DNA-binding proteins found in the chromatin of all eukaryotic cells. They are highly conserved and can be grouped into five major classes: H1/H5, H2A, H2B, H3, and H4. Two copies of H2A, H2B, H3, and H4 bind to about 160 base pairs of DNA forming the core of the nucleosome (the repeating structure of chromatin) and H1/H5 bind to its DNA linker sequence. Overall, histones have a high arginine/lysine content that is optimal for interaction with DNA. This sequence bias can make the classification of histones difficult using standard sequence similarity approaches. Therefore, in this paper, we applied support vector machine (SVM) to recognize and classify histones on the basis of their amino acid and dipeptide composition. On evaluation through a five-fold cross-validation, the SVM-based method was able to distinguish histones from nonhistones (nuclear proteins) with an accuracy around 98%. Similarly, we obtained an overall >95% accuracy in discriminating the five classes of histones through the application of 1-versus-rest (1-v-r) SVM. Finally, we have applied this SVM-based method to the detection of histones from whole proteomes and found a comparable sensitivity to that accomplished by hidden Markov motifs (HMM) profiles. 相似文献
9.
MOTIVATION: Numerous methods for predicting beta-turns in proteins have been developed based on various computational schemes. Here, we introduce a new method of beta-turn prediction that uses the support vector machine (SVM) algorithm together with predicted secondary structure information. Various parameters from the SVM have been adjusted to achieve optimal prediction performance. RESULTS: The SVM method achieved excellent performance as measured by the Matthews correlation coefficient (MCC = 0.45) using a 7-fold cross validation on a database of 426 non-homologous protein chains. To our best knowledge, this MCC value is the highest achieved so far for predicting beta-turn. The overall prediction accuracy Qtotal was 77.3%, which is the best among the existing prediction methods. Among its unique attractive features, the present SVM method avoids overtraining and compresses information and provides a predicted reliability index. 相似文献
10.
The per capita ecological footprint (EF) is one of the most widely recognized measures of environmental sustainability. It aims to quantify the Earth's biological resources required to support human activity. In this paper, we summarize relevant previous literature, and present five factors that influence per capita EF. These factors are: National gross domestic product (GDP), urbanization (independent of economic development), distribution of income (measured by the Gini coefficient), export dependence (measured by the percentage of exports to total GDP), and service intensity (measured by the percentage of service to total GDP). A new ecological footprint model based on a support vector machine (SVM), which is a machine-learning method based on the structural risk minimization principle from statistical learning theory was conducted to calculate the per capita EF of 24 nations using data from 123 nations. The calculation accuracy was measured by average absolute error and average relative error. They were 0.004883 and 0.351078% respectively. Our results demonstrate that the EF model based on SVM has good calculation performance. 相似文献
11.
IntroductionUntargeted metabolomics studies for biomarker discovery often have hundreds to thousands of human samples. Data acquisition of large-scale samples has to be divided into several batches and may span from months to as long as several years. The signal drift of metabolites during data acquisition (intra- and inter-batch) is unavoidable and is a major confounding factor for large-scale metabolomics studies.ObjectivesWe aim to develop a data normalization method to reduce unwanted variations and integrate multiple batches in large-scale metabolomics studies prior to statistical analyses.MethodsWe developed a machine learning algorithm-based method, support vector regression (SVR), for large-scale metabolomics data normalization and integration. An R package named MetNormalizer was developed and provided for data processing using SVR normalization.ResultsAfter SVR normalization, the portion of metabolite ion peaks with relative standard deviations (RSDs) less than 30 % increased to more than 90 % of the total peaks, which is much better than other common normalization methods. The reduction of unwanted analytical variations helps to improve the performance of multivariate statistical analyses, both unsupervised and supervised, in terms of classification and prediction accuracy so that subtle metabolic changes in epidemiological studies can be detected.ConclusionSVR normalization can effectively remove the unwanted intra- and inter-batch variations, and is much better than other common normalization methods. 相似文献
12.
MOTIVATION: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of 'high-dimensional low sample size'. Therefore, robust and accurate gene selection methods are required to identify differentially expressed group of genes across different samples, e.g. between cancerous and normal cells. Successful gene selection will help to classify different cancer types, lead to a better understanding of genetic signatures in cancers and improve treatment strategies. Although gene selection and cancer classification are two closely related problems, most existing approaches handle them separately by selecting genes prior to classification. We provide a unified procedure for simultaneous gene selection and cancer classification, achieving high accuracy in both aspects. RESULTS: In this paper we develop a novel type of regularization in support vector machines (SVMs) to identify important genes for cancer classification. A special nonconvex penalty, called the smoothly clipped absolute deviation penalty, is imposed on the hinge loss function in the SVM. By systematically thresholding small estimates to zeros, the new procedure eliminates redundant genes automatically and yields a compact and accurate classifier. A successive quadratic algorithm is proposed to convert the non-differentiable and non-convex optimization problem into easily solved linear equation systems. The method is applied to two real datasets and has produced very promising results. AVAILABILITY: MATLAB codes are available upon request from the authors. 相似文献
13.
Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. 相似文献
17.
We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license. 相似文献
19.
Background In discriminant analysis of microarray data, usually a small number of samples are expressed by a large number of genes. It
is not only difficult but also unnecessary to conduct the discriminant analysis with all the genes. Hence, gene selection
is usually performed to select important genes. 相似文献
20.
Background The binding between peptide epitopes and major histocompatibility complex proteins (MHCs) is an important event in the cellular
immune response. Accurate prediction of the binding between short peptides and the MHC molecules has long been a principal
challenge for immunoinformatics. Recently, the modeling of MHC-peptide binding has come to emphasize quantitative predictions:
instead of categorizing peptides as "binders" or "non-binders" or as "strong binders" and "weak binders", recent methods seek
to make predictions about precise binding affinities. 相似文献
|