期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Ischemia episode detection in ECG using kernel density estimation, support vector machine and feature selection

Park J Pedrycz W Jeon M 《Biomedical engineering online》2012,11(1):30-22

ABSTRACT: BACKGROUND: Myocardial ischemia can be developed into more serious diseases. Early Detection of the ischemic syndrome inelectrocardiogram (ECG) more accurately and automatically can prevent it from developing into a catastrophicdisease. To this end, we propose a new method, which employs wavelets and simple feature selection. METHODS: For training and testing, the European ST-T database is used, which is comprised of 367 ischemic ST episodes in90 records. We first remove baseline wandering, and detect time positions of QRS complexes by a method basedon the discrete wavelet transform. Next, for each heart beat, we extract three features which can be used fordifferentiating ST episodes from normal: 1) the area between QRS offset and T-peak points, 2) the normalizedand signed sum from QRS offset to effective zero voltage point, and 3) the slope from QRS onset to offset point.We average the feature values for successive five beats to reduce effects of outliers. Finally we apply classifiersto those features. RESULTS: We evaluated the algorithm by kernel density estimation (KDE) and support vector machine (SVM) methods.Sensitivity and specificity for KDE were 0.939 and 0.912, respectively. The KDE classifier detects 349 ischemicST episodes out of total 367 ST episodes. Sensitivity and specificity of SVM were 0.941 and 0.923, respectively.The SVM classifier detects 355 ischemic ST episodes. CONCLUSIONS: We proposed a new method for detecting ischemia in ECG. It contains signal processing techniques of removingbaseline wandering and detecting time positions of QRS complexes by discrete wavelet transform, and featureextraction from morphology of ECG waveforms explicitly. It was shown that the number of selected featureswere sufficient to discriminate ischemic ST episodes from the normal ones. We also showed how the proposedKDE classifier can automatically select kernel bandwidths, meaning that the algorithm does not require anynumerical values of the parameters to be supplied in advance. In the case of the SVM classifier, one has to selecta single parameter. 相似文献

2.

On the classification of a small imbalanced cytogenetic image database

Lerner B Yeshaya J Koushnir L 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(2):204-215

Solving a multiclass classification task using a small imbalanced database of patterns of high dimension is difficult due to the curse-of-dimensionality and the bias of the training toward the majority classes. Such a problem has arisen while diagnosing genetic abnormalities by classifying a small database of fluorescence in situ hybridization signals of types having different frequencies of occurrence. We propose and experimentally study using the cytogenetic domain two solutions to the problem. The first is hierarchical decomposition of the classification task, where each hierarchy level is designed to tackle a simpler problem which is represented by classes that are approximately balanced. The second solution is balancing the data by up-sampling the minority classes accompanied by dimensionality reduction. Implemented by the naive Bayesian classifier or the multilayer perceptron neural network, both solutions have diminished the problem and contributed to accuracy improvement. In addition, the experiments suggest that coping with the smallness of the data is more beneficial than dealing with its imbalance 相似文献

3.

Towards reducing the impacts of unwanted movements on identification of motion intentions

《Journal of electromyography and kinesiology》2016

Surface electromyogram (sEMG) has been extensively used as a control signal in prosthesis devices. However, it is still a great challenge to make multifunctional myoelectric prostheses clinically available due to a number of critical issues associated with existing EMG based control strategy. One such issue would be the effect of unwanted movements (UMs) that are inadvertently done by users on the performance of movement classification in EMG pattern recognition based algorithms. Since UMs are not considered in training a classifier, they would decay the performance of a trained classifier in identifying the target movements (TMs), which would cause some undesired actions in control of multifunctional prostheses. In this study, the impact of UMs was systemically investigated in both able-bodied subjects and transradial amputees. Our results showed that the UMs would be unevenly classified into all classes of the TMs. To reduce the impact of the UMs on the performance of a classifier, a new training strategy that would categorize all possible UMs into a new movement class was proposed and a metric called Reject Ratio that is a measure of how many UMs is rejected by a trained classifier was adopted. The results showed that the average Reject Ratio across all the participants was greater than 91%, meanwhile the average classification accuracy of TMs was above 99% when UMs occurred. This suggests that the proposed training strategy could greatly reduce the impact of UMs on the performance of the trained classifier in identifying the TMs and may enhance the robustness of myoelectric control in clinical applications. 相似文献

4.

Use of supervised discretization with PCA in wavelet packet transformation-based surface electromyogram classification

Kirkpong Kiatpanichagij Nitin Afzulpurkar 《Biomedical signal processing and control》2009,4(2):127-138

This paper describes a preprocessing stage for nonlinear classifier used in wavelet packet transformation (WPT)-based multichannel surface electromyogram (EMG) classification. The preprocessing stage named sdPCA, which consists of supervised discretization coupled with principal component analysis (PCA), was developed for improving surface EMG classifier generalization ability and training speed on overlap segmented signals. The sdPCA outperforms the fast correlation-based filter (FCBF), PCA, supervised discretization, and their combinations in terms of the highest generalization ability, fast training speed, the small feature size, and an ability to reduce the risks of developing oscillation and being trapped in nonlinear classifier training. The experiments were conducted on a data set consisting of 4-channel surface EMG signals measured from 6 hand and wrist gestures of 12 subjects. The experimental results indicate that the classification system using sdPCA has the highest generalization ability along with the second fastest training speed. The classification accuracy in 12 subjects of the system using sdPCA is 93.30 ± 2.42% taking 400 epochs for training by overlap segmented signals within 100 s. This result is very attractive for further development because we can achieve high-classification accuracy for large data sets by means of the proposed sdPCA without the application of additional algorithms such as local discriminant bases (LDB), majority voting (MV), or WPT sub-bands clustering. 相似文献

5.

Establishment of a five‐enzalutamide‐resistance‐related‐gene‐based classifier for recurrence‐free survival predicting of prostate cancer

Jing Chen Jialin Meng Yi Liu Zichen Bian Qingsong Niu Junyi Chen Jun Zhou Li Zhang Meng Zhang Chaozhao Liang 《Journal of cellular and molecular medicine》2022,26(21):5379

To identify prostate cancer (PCa) patients with a high risk of recurrence is critical before delivering adjuvant treatment. We developed a classifier based on the Enzalutamide treatment resistance‐related genes to assist the currently available staging system in predicting the recurrence‐free survival (RFS) prognosis of PCa patients. We overlapped the DEGs from two datasets to obtain a more convincing Enzalutamide‐resistance‐related‐gene (ERRG) cluster. The five‐ERRG‐based classifier obtained good predictive values in both the training and validation cohorts. The classifier precisely predicted RFS of patients in four cohorts, independent of patient age, pathological tumour stage, Gleason score and PSA levels. The classifier and the clinicopathological factors were combined to construct a nomogram, which had an increased predictive accuracy than that of each variable alone. Besides, we also compared the differences between high‐ and low‐risk subgroups and found their differences were enriched in cancer progression‐related pathways. The five‐ERRG‐based classifier is a practical and reliable predictor, which adds value to the existing staging system for predicting the RFS prognosis of PCa after radical prostatectomy, enabling physicians to make more informed treatment decisions concerning adjuvant therapy. 相似文献

6.

Data-dependent kernel machines for microarray data classification

Xiong H Zhang Y Chen XW 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(4):583-595

One important application of gene expression analysis is to classify tissue samples according to their gene expression levels. Gene expression data are typically characterized by high dimensionality and small sample size, which makes the classification task quite challenging. In this paper, we present a data-dependent kernel for microarray data classification. This kernel function is engineered so that the class separability of the training data is maximized. A bootstrapping-based resampling scheme is introduced to reduce the possible training bias. The effectiveness of this adaptive kernel for microarray data classification is illustrated with a k-Nearest Neighbor (KNN) classifier. Our experimental study shows that the data-dependent kernel leads to a significant improvement in the accuracy of KNN classifiers. Furthermore, this kernel-based KNN scheme has been demonstrated to be competitive to, if not better than, more sophisticated classifiers such as Support Vector Machines (SVMs) and the Uncorrelated Linear Discriminant Analysis (ULDA) for classifying gene expression data. 相似文献

7.

基于小波低频系数基因芯片数据的特征提取

刘玉杰刘毅慧《生物信息学》2011,9(3):255-258,262

特征提取和分类是模式识别中的关键问题。结合小波分析理论和支持向量机理论,构造分类器模型,将前列腺癌基因芯片数据分成癌症和正常两种。提取小波低频系数表征原始数据并送入支持向量机分类器分类,实验证明:提取db1小波4层分解下的低频系数,送入分类器分类后正确分类率达到93.53%。Haar小波的正确率是92.94%。可见提取不同小波低频系数,得到的分类效果相差不大。相似文献

8.

Data reduction using a discrete wavelet transform in discriminant analysis of very high dimensionality data

Qu Y Adam BL Thornquist M Potter JD Thompson ML Yasui Y Davis J Schellhammer PF Cazares L Clements M Wright GL Feng Z 《Biometrics》2003,59(1):143-151

We present a method of data reduction using a wavelet transform in discriminant analysis when the number of variables is much greater than the number of observations. The method is illustrated with a prostate cancer study, where the sample size is 248, and the number of variables is 48,538 (generated using the ProteinChip technology). Using a discrete wavelet transform, the 48,538 data points are represented by 1271 wavelet coefficients. Information criteria identified 11 of the 1271 wavelet coefficients with the highest discriminatory power. The linear classifier with the 11 wavelet coefficients detected prostate cancer in a separate test set with a sensitivity of 97% and specificity of 100%. 相似文献

9.

Face recognition from a single image per person using deep architecture neural networks

Tian Zhuo 《Cluster computing》2016,19(1):73-77

Implementing an accurate face recognition system requires images in different variations, and if our database is large, we suffer from problems such as storing cost and low speed in recognition algorithms. On the other hand, in some applications there is only one image available per person for training recognition model. In this article, we propose a neural network model inspired of bidirectional analysis and synthesis brain network which can learn nonlinear mapping between image space and components space. Using a deep neural network model, we have tried to separate pose components from person ones. After setting apart these components, we can use them to synthesis virtual images of test data in different pose and lighting conditions. These virtual images are used to train neural network classifier. The results showed that training neural classifier with virtual images gives better performance than training classifier with frontal view images. 相似文献

10.

Investigating the Effect of Flickering Frequency Pair and Mother Wavelet Selection in Steady-State Visually-Evoked Potentials on Two-Command Brain-Computer Interfaces

《IRBM》2022,43(6):594-603

IntroductionSteady-state visually evoked potentials (SSVEPs) have become popular in brain-computer interface (BCI) applications in addition to many other applications on clinical neuroscience (neurodegenerative disorders, schizophrenia, epilepsy, etc.), cognitive (visual attention, working memory, brain rhythms, etc.), and use of engineering researches. Among available methods to measure brain activities, SSVEPs have advantages like higher information transfer rate, simplicity in structure, and short training time. SSVEP-based BCIs use flickering stimuli at different frequencies to discriminate distinct commands in real life. Some features are extracted from the SSVEP signals before these commands are classified. The wavelet transform (WT) has attracted researchers among feature extraction methods since it utilizes the non-stationary signals well. In the WT, a sample function (named mother wavelet) represents the SSVEP signal in both time and frequency domains. Unfortunately, there is no universal mother wavelet function that fits all the signals. Therefore, choosing an appropriate mother wavelet function may be a challenge in WT-related studies. Although there are such studies in three- and seven-command SSVEP-based studies, there is no study for two-command systems in our knowledge.Materials and MethodsIn this study, two user commands flickered at the combinations of seven different frequencies were tested to determine which frequency pairs give the highest performance. For this purpose, three well-known wavelet features (energy, entropy, and variance) were calculated for each of derived EEG frequency bands from the discrete WT coefficients of SSVEP signals. The WT was repeated for six different mother wavelet functions (Haar, Db4, Sym4, Coif1, Bior3.5, and Rbior2.8). Then, four feature sets (every three features, and all together) were applied to seven commonly-used machine learning algorithms (Decision Tree, Discriminant Analysis, Logistic Regression, Naive Bayes, Support Vector Machines, Nearest Neighbors, and Ensemble Classifiers).Results and DiscussionWe achieved 100% accuracies among these 3,528 runs (7 classifiers x 4 feature sets x 6 mother wavelets x 21 flickering frequency pairs) using the mother wavelet function of Haar and the Ensemble Learner classifier. The highest classifier performances are 100% when two commands have the flickering frequency pairs of (6.0 and 10 Hz), (6.5 and 8.2 Hz), or (6.5 and 10.0 Hz).ConclusionWe obtained three main outcomes from this study. First, the most representative mother wavelet function was Haar, while the worst one was Symlet 4. Second, the Ensemble Learner classifier gave the maximum classifier performance in a two-command SSVEP-based BCI system. Besides, two user commands from SSVEP should be one of the frequency pairs of (6.0 and 10.0 Hz), (6.5 and 8.2 Hz), and (6.5 and 10.0 Hz) to achieve the maximum accuracy. 相似文献

11.

Exponential decay characteristics of the stochastic integer multiple neural firing patterns

Gu H Jia B Lu Q 《Cognitive neurodynamics》2011,5(1):87-101

Integer multiple neural firing patterns exhibit multi-peaks in inter-spike interval (ISI) histogram (ISIH) and exponential decay in amplitude of peaks, which results from their stochastic mechanisms. But in previous experimental observation that the decay in ISIH frequently shows obvious bias from exponential law. This paper studied three typical cases of the decay, by transforming ISI series of the firing to discrete binary chain and calculating the probabilities or frequencies of symbols over the whole chain. The first case is the exponential decay without bias. An example of this case was discovered on hippocampal CA1 pyramidal neuron stimulated by external signal. Probability calculation shows that this decay without bias results from a stochastic renewal process, in which the successive spikes are independent. The second case is the exponential decay with a higher first peak, while the third case is that with a lower first peak. An example of the second case was discovered in experiment on a neural pacemaker. Simulation and calculation of the second and third cases indicate that the dependency in successive spikes of the firing leads to the bias seen in decay of ISIH peaks. The quantitative expression of the decay slope of three cases of firing patterns, as well as the excitatory effect in the second case of firing pattern and the inhibitory effect in the third case of firing pattern are identified. The results clearly reveal the mechanism of the exponential decay in ISIH peaks of a number of important neural firing patterns and provide new understanding for typical bias from the exponential decay law. 相似文献

12.

Rgtsp: a generalized top scoring pairs package for class prediction

Popovici V Budinská E Delorenzi M 《Bioinformatics (Oxford, England)》2011,27(12):1729-1730

SUMMARY: A top scoring pair (TSP) classifier consists of a pair of variables whose relative ordering can be used for accurately predicting the class label of a sample. This classification rule has the advantage of being easily interpretable and more robust against technical variations in data, as those due to different microarray platforms. Here we describe a parallel implementation of this classifier which significantly reduces the training time, and a number of extensions, including a multi-class approach, which has the potential of improving the classification performance. AVAILABILITY AND IMPLEMENTATION: Full C++ source code and R package Rgtsp are freely available from http://lausanne.isb-sib.ch/~vpopovic/research/. The implementation relies on existing OpenMP libraries. 相似文献

13.

Effective sample selection for classification of pre-miRNAs

Han K 《Genetics and molecular research : GMR》2011,10(1):506-518

To solve the class imbalance problem in the classification of pre-miRNAs with the ab initio method, we developed a novel sample selection method according to the characteristics of pre-miRNAs. Real/pseudo pre-miRNAs are clustered based on their stem similarity and their distribution in high dimensional sample space, respectively. The training samples are selected according to the sample density of each cluster. Experimental results are validated by the cross-validation and other testing datasets composed of human real/pseudo pre-miRNAs. When compared with the previous method, microPred, our classifier miRNAPred is nearly 12% more accurate. The selected training samples also could be used to train other SVM classifiers, such as triplet-SVM, MiPred, miPred, and microPred, to improve their classification performance. The sample selection algorithm is useful for constructing a more efficient classifier for the classification of real pre-miRNAs and pseudo hairpin sequences. 相似文献

14.

A hybrid machine-learning approach for segmentation of protein localization data

Kasson PM Huppa JB Davis MM Brunger AT 《Bioinformatics (Oxford, England)》2005,21(19):3778-3786

MOTIVATION: Subcellular protein localization data are critical to the quantitative understanding of cellular function and regulation. Such data are acquired via observation and quantitative analysis of fluorescently labeled proteins in living cells. Differentiation of labeled protein from cellular artifacts remains an obstacle to accurate quantification. We have developed a novel hybrid machine-learning-based method to differentiate signal from artifact in membrane protein localization data by deriving positional information via surface fitting and combining this with fluorescence-intensity-based data to generate input for a support vector machine. RESULTS: We have employed this classifier to analyze signaling protein localization in T-cell activation. Our classifier displayed increased performance over previously available techniques, exhibiting both flexibility and adaptability: training on heterogeneous data yielded a general classifier with good overall performance; training on more specific data yielded an extremely high-performance specific classifier. We also demonstrate accurate automated learning utilizing additional experimental data. 相似文献

15.

Machine learning methods for predictive proteomics

Barla A Jurman G Riccadonna S Merler S Chierici M Furlanello C 《Briefings in bioinformatics》2008,9(2):119-128

The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 10(3) times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. The aim of this review is to explain how to build a general purpose design analysis protocol (DAP) for predictive proteomic profiling: we show how to limit leakage due to parameter tuning and how to organize classification and ranking on large numbers of replicate versions of the original data to avoid selection bias. The DAP can be used with alternative components, i.e. with different preprocessing methods (peak clustering or wavelet based), classifiers e.g. Support Vector Machine (SVM) or feature ranking methods (recursive feature elimination or I-Relief). A procedure for assessing stability and predictive value of the resulting biomarkers' list is also provided. The approach is exemplified with experiments on synthetic datasets (from the Cromwell MS simulator) and with publicly available datasets from cancer studies. 相似文献

16.

Multi-Input Distributed Classifiers for Synthetic Genetic Circuits

Oleg Kanakov Roman Kotelnikov Ahmed Alsaedi Lev Tsimring Ramón Huerta Alexey Zaikin Mikhail Ivanchenko 《PloS one》2015,10(5)

For practical construction of complex synthetic genetic networks able to perform elaborate functions it is important to have a pool of relatively simple modules with different functionality which can be compounded together. To complement engineering of very different existing synthetic genetic devices such as switches, oscillators or logical gates, we propose and develop here a design of synthetic multi-input classifier based on a recently introduced distributed classifier concept. A heterogeneous population of cells acts as a single classifier, whose output is obtained by summarizing the outputs of individual cells. The learning ability is achieved by pruning the population, instead of tuning parameters of an individual cell. The present paper is focused on evaluating two possible schemes of multi-input gene classifier circuits. We demonstrate their suitability for implementing a multi-input distributed classifier capable of separating data which are inseparable for single-input classifiers, and characterize performance of the classifiers by analytical and numerical results. The simpler scheme implements a linear classifier in a single cell and is targeted at separable classification problems with simple class borders. A hard learning strategy is used to train a distributed classifier by removing from the population any cell answering incorrectly to at least one training example. The other scheme implements a circuit with a bell-shaped response in a single cell to allow potentially arbitrary shape of the classification border in the input space of a distributed classifier. Inseparable classification problems are addressed using soft learning strategy, characterized by probabilistic decision to keep or discard a cell at each training iteration. We expect that our classifier design contributes to the development of robust and predictable synthetic biosensors, which have the potential to affect applications in a lot of fields, including that of medicine and industry. 相似文献

17.

CyclinPred: a SVM-based method for predicting cyclin protein sequences

Kalita MK Nandal UK Pattnaik A Sivalingam A Ramasamy G Kumar M Raghava GP Gupta D 《PloS one》2008,3(7):e2605

Functional annotation of protein sequences with low similarity to well characterized protein sequences is a major challenge of computational biology in the post genomic era. The cyclin protein family is once such important family of proteins which consists of sequences with low sequence similarity making discovery of novel cyclins and establishing orthologous relationships amongst the cyclins, a difficult task. The currently identified cyclin motifs and cyclin associated domains do not represent all of the identified and characterized cyclin sequences. We describe a Support Vector Machine (SVM) based classifier, CyclinPred, which can predict cyclin sequences with high efficiency. The SVM classifier was trained with features of selected cyclin and non cyclin protein sequences. The training features of the protein sequences include amino acid composition, dipeptide composition, secondary structure composition and PSI-BLAST generated Position Specific Scoring Matrix (PSSM) profiles. Results obtained from Leave-One-Out cross validation or jackknife test, self consistency and holdout tests prove that the SVM classifier trained with features of PSSM profile was more accurate than the classifiers based on either of the other features alone or hybrids of these features. A cyclin prediction server--CyclinPred has been setup based on SVM model trained with PSSM profiles. CyclinPred prediction results prove that the method may be used as a cyclin prediction tool, complementing conventional cyclin prediction methods. 相似文献

18.

Anoxia can increase the rate of decay for cnidarian tissue: Using Actinia equina to understand the early fossil record

Anthony D. Hancy Jonathan B. Antcliffe 《Geobiology》2020,18(2):167-184

An experimental decay methodology is developed for a cnidarian model organism to serve as a comparison to the many previous such studies on bilaterians. This allows an examination of inherent bias against the fossilisation of cnidarian tissue and their diagnostic characters, under what conditions these occur, and in what way. The decay sequence of Actinia equina was examined under a series of controlled conditions. These experiments show that cnidarian decay begins with an initial rupturing of the epidermis, followed by rapid loss of recognisable internal morphological characters. This suggests that bacteria work quicker on the epidermis than autolysis does on the internal anatomy. The data also show that diploblastic tissue is not universally decayed more slowly under anoxic or reducing conditions than under oxic conditions. Indeed, some cnidarian characters decay more rapidly under anoxic conditions than they do under oxic conditions. This suggests the decay pathways acting may be different to those affecting soft bilaterian tissue such as soft epidermis and internal organs. What is most important in the decay of soft polyp anatomy is the microbial community, which can be dominated by oxic or anoxic bacteria. Different Lagerstätte, even of the same type, will inevitably have subtle difference in their bacterial communities, which among other factors, could be a control on soft polyp preservation leading to either an absence of compelling soft anthozoans (Burgess Shale) or an astonishing abundance (Qingjiang biota). 相似文献

19.

Precambrian animal life: Taphonomy of phosphatized metazoan embryos from southwest China

DORNBOS STEPHEN BOTTJER DAVID JUN-YUAN CHEN OLIVERI PAOLA GAO FENG LI CHIA-WEI 《Lethaia: An International Journal of Palaeontology and Stratigraphy》2005,38(2):101-109

Phosphatized fossils from the Neoproterozoic Doushantuo Formation have provided valuable insight into the early evolution of metazoans, but the preservation of these spectacular fossils is not yet fully understood. This research begins to address this issue by performing a detailed specimen-based taphonomic analysis of the Doushantuo Formation phosphatized metazoan embryos. A total of 206 embryos in 65 thin sections from the Weng'an Phosphorite Member of the Doushantuo Formation were examined and their levels of pre-phosphatization decay estimated. The data produced from this examination reveal a strong taphonomic bias toward earlier (2-cell and 4-cell) cleavage stages, which tend to be well-preserved, and away from later (8-cell and 16-cell) cleavage stages, which tend to exhibit evidence for slight to intense levels of organic decay. In addition, the natural abundances of these embryos tend to decrease with advancement in cleavage stage, and no evidence of more advanced (beyond 16-cell) cleavage stages or eventual adult forms were found in this study. One possible explanation for this taphonomic bias toward early cleavage stages is that later cleavage stages and adult forms were more physically delicate, allowing them to be more easily damaged during burial and reworking, allowing for more rapid decay. The spectacular preservation of these embryos was probably aided by their likely internal enrichment in phosphate-rich yolk, which would have caused their internal dissolved phosphate levels to reach critical levels with only miniscule organic decay, thereby hastening phosphatization. If internal sources of phosphate did indeed play a role in the phosphatization of these embryos, it may explain their prolific abundance in these rocks compared to other phosphatized fossils as well as indicating that metazoans lacking such internal phosphate sources were likely much more difficult to preserve. The phosphatic fossils of the Doushantuo Formation, therefore, provide an indispensable, yet restricted, window into Neoproterozoic life and metazoan origins. 相似文献

20.

The top-scoring 'N' algorithm: a generalized relative expression classification method from small numbers of biomolecules

AT Magis ND Price 《BMC bioinformatics》2012,13(1):227

ABSTRACT: BACKGROUND: Relative expression algorithms such as the top-scoring pair (TSP) and the top-scoring triplet (TST) have several strengths that distinguish them from other classification methods, including resistance to overfitting, invariance to most data normalization methods, and biological interpretability. The top-scoring 'N' (TSN) algorithm is a generalized form of other relative expression algorithms which uses generic permutations and a dynamic classifier size to control both the permutation and combination space available for classification. RESULTS: TSN was tested on nine cancer datasets, showing statistically significant differences in classification accuracy between different classifier sizes (choices of N). TSN also performed competitively against a wide variety of different classification methods, including artificial neural networks, classification trees, discriminant analysis, k-Nearest neighbor, naive Bayes, and support vector machines, when tested on the Microarray Quality Control II datasets. Furthermore, TSN exhibits low levels of overfitting on training data compared to other methods, giving confidence that results obtained during cross validation will be more generally applicable to external validation sets. CONCLUSIONS: TSN preserves the strengths of other relative expression algorithms while allowing a much larger permutation and combination space to be explored, potentially improving classification accuracies when fewer numbers of measured features are available. 相似文献