首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Currently, remote sensing technologies were widely employed in the dynamic monitoring of the land. This paper presented an algorithm named fuzzy nonlinear proximal support vector machine (FNPSVM) by basing on ETM+ remote sensing image. This algorithm is applied to extract various types of lands of the city Da’an in northern China. Two multi-category strategies, namely “one-against-one” and “one-against-rest” for this algorithm were described in detail and then compared. A fuzzy membership function was presented to reduce the effects of noises or outliers on the data samples. The approaches of feature extraction, feature selection, and several key parameter settings were also given. Numerous experiments were carried out to evaluate its performances including various accuracies (overall accuracies and kappa coefficient), stability, training speed, and classification speed. The FNPSVM classifier was compared to the other three classifiers including the maximum likelihood classifier (MLC), back propagation neural network (BPN), and the proximal support vector machine (PSVM) under different training conditions. The impacts of the selection of training samples, testing samples and features on the four classifiers were also evaluated in these experiments.  相似文献   

2.
MOTIVATION: Subcellular protein localization data are critical to the quantitative understanding of cellular function and regulation. Such data are acquired via observation and quantitative analysis of fluorescently labeled proteins in living cells. Differentiation of labeled protein from cellular artifacts remains an obstacle to accurate quantification. We have developed a novel hybrid machine-learning-based method to differentiate signal from artifact in membrane protein localization data by deriving positional information via surface fitting and combining this with fluorescence-intensity-based data to generate input for a support vector machine. RESULTS: We have employed this classifier to analyze signaling protein localization in T-cell activation. Our classifier displayed increased performance over previously available techniques, exhibiting both flexibility and adaptability: training on heterogeneous data yielded a general classifier with good overall performance; training on more specific data yielded an extremely high-performance specific classifier. We also demonstrate accurate automated learning utilizing additional experimental data.  相似文献   

3.
The aim of this study was the development, evaluation and analysis of a neuro-fuzzy classifier for a supervised and hard classification of coastal environmental vulnerability due to marine aquaculture using minimal training sets within a Geographic Information System (GIS). The neuro-fuzzy classification model NEFCLASS‐J, was used to develop learning algorithms to create the structure (rule base) and the parameters (fuzzy sets) of a fuzzy classifier from a set of labeled data. The training sites were manually classified based on four categories of coastal environmental vulnerability through meetings and interviews with experts having field experience and specific knowledge of the environmental problems investigated. The inter-class separability estimations were performed on the training data set to assess the difficulty of the class separation problem under investigation. The two training data sets did not follow the assumptions of multivariate normality. For this reason Bhattacharyy and Jeffries–Matusita distances were used to estimate the probability of correct classification. Further evaluation and analysis of the quality of the classification achieved low values of quantity and allocation disagreement and a good overall accuracy. For each of the four classes the user and producer values for accuracy were between 77% and 100%.In conclusion, the use of a neuro-fuzzy classifier for a supervised and hard classification of coastal environmental vulnerability demonstrated an ability to derive an accurate and reliable classification using a minimal number of training sets.  相似文献   

4.
MOTIVATION: Classification is widely used in medical applications. However, the quality of the classifier depends critically on the accurate labeling of the training data. But for many medical applications, labeling a sample or grading a biopsy can be subjective. Existing studies confirm this phenomenon and show that even a very small number of mislabeled samples could deeply degrade the performance of the obtained classifier, particularly when the sample size is small. The problem we address in this paper is to develop a method for automatically detecting samples that are possibly mislabeled. RESULTS: We propose two algorithms, a classification-stability algorithm and a leave-one-out-error-sensitivity algorithm for detecting possibly mislabeled samples. For both algorithms, the key structure is the computation of the leave-one-out perturbation matrix. The classification-stability algorithm is based on measuring the stability of the label of a sample with respect to label changes of other samples and the version of this algorithm based on the support vector machine appears to be quite accurate for three real datasets. The suspect list produced by the version is of high quality. Furthermore, when human intervention is not available, the correction heuristic appears to be beneficial.  相似文献   

5.
Proteins can move from blood circulation into salivary glands through active transportation, passive diffusion or ultrafiltration, some of which are then released into saliva and hence can potentially serve as biomarkers for diseases if accurately identified. We present a novel computational method for predicting salivary proteins that come from circulation. The basis for the prediction is a set of physiochemical and sequence features we found to be discerning between human proteins known to be movable from circulation to saliva and proteins deemed to be not in saliva. A classifier was trained based on these features using a support-vector machine to predict protein secretion into saliva. The classifier achieved 88.56% average recall and 90.76% average precision in 10-fold cross-validation on the training data, indicating that the selected features are informative. Considering the possibility that our negative training data may not be highly reliable (i.e., proteins predicted to be not in saliva), we have also trained a ranking method, aiming to rank the known salivary proteins from circulation as the highest among the proteins in the general background, based on the same features. This prediction capability can be used to predict potential biomarker proteins for specific human diseases when coupled with the information of differentially expressed proteins in diseased versus healthy control tissues and a prediction capability for blood-secretory proteins. Using such integrated information, we predicted 31 candidate biomarker proteins in saliva for breast cancer.  相似文献   

6.
Hong CS  Cui J  Ni Z  Su Y  Puett D  Li F  Xu Y 《PloS one》2011,6(2):e16875
A novel computational method for prediction of proteins excreted into urine is presented. The method is based on the identification of a list of distinguishing features between proteins found in the urine of healthy people and proteins deemed not to be urine excretory. These features are used to train a classifier to distinguish the two classes of proteins. When used in conjunction with information of which proteins are differentially expressed in diseased tissues of a specific type versus control tissues, this method can be used to predict potential urine markers for the disease. Here we report the detailed algorithm of this method and an application to identification of urine markers for gastric cancer. The performance of the trained classifier on 163 proteins was experimentally validated using antibody arrays, achieving >80% true positive rate. By applying the classifier on differentially expressed genes in gastric cancer vs normal gastric tissues, it was found that endothelial lipase (EL) was substantially suppressed in the urine samples of 21 gastric cancer patients versus 21 healthy individuals. Overall, we have demonstrated that our predictor for urine excretory proteins is highly effective and could potentially serve as a powerful tool in searches for disease biomarkers in urine in general.  相似文献   

7.
由于基因表达数据高属性维、低样本维的特点,Fisher分类器对该种数据分类性能不是很高。本文提出了Fisher的改进算法Fisher-List。该算法独特之处在于为每个类别确定一个决策阀值,每个阀值既包含总体样本信息,又含有某些对分类至关重要的个体样本信息。本文用实验证明新算法在基因表达数据分类方面比Fisher、LogitBoost、AdaBoost、k-近邻法、决策树和支持向量机具有更高的性能。  相似文献   

8.
The subcellular locations of proteins are important functional annotations. An effective and reliable subcellular localization method is necessary for proteomics research. This paper introduces a new method---PairProSVM---to automatically predict the subcellular locations of proteins. The profiles of all protein sequences in the training set are constructed by PSI-BLAST and the pairwise profile-alignment scores are used to form feature vectors for training a support vector machine (SVM) classifier. It was found that PairProSVM outperforms the methods that are based on sequence alignment and amino-acid compositions even if most of the homologous sequences have been removed. This paper also demonstrates that the performance of PairProSVM is sensitive (and somewhat proportional) to the degree of its kernel matrix meeting the Mercer's condition. PairProSVM was evaluated on Reinhardt and Hubbard's, Huang and Li's, and Gardy et al.'s protein datasets. The overall accuracies on these three datasets reach 99.3\%, 76.5\%, and 91.9\%, respectively, which are higher than or comparable to those obtained by sequence alignment and by the methods compared in this paper.  相似文献   

9.
蛋白质折叠模式识别是一种分析蛋白质结构的重要方法。以序列相似性较低的蛋白质为训练集,提取蛋白质序列信息频数及疏水性等信息作为折叠类型特征,从SCOP数据库中已分类蛋白质构建1 393种折叠模式的数据集,采用SVM预测蛋白质1 393种折叠模式。封闭测试准确率达99.612 2%,基于SCOP的开放测试准确率达79.632 9%。基于另一个权威测试集的开放测试折叠准确率达64.705 9%,SCOP类准确率达76.470 6%,可以有效地对蛋白质折叠模式进行预测,从而为蛋白质从头预测提供参考。  相似文献   

10.
Apoptosis proteins have a central role in the development and the homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. The function of an apoptosis protein is closely related to its subcellular location. It is crucial to develop powerful tools to predict apoptosis protein locations for rapidly increasing gap between the number of known structural proteins and the number of known sequences in protein databank. In this study, amino acids pair compositions with different spaces are used to construct feature sets for representing sample of protein feature selection approach based on binary particle swarm optimization, which is applied to extract effective feature. Ensemble classifier is used as prediction engine, of which the basic classifier is the fuzzy K-nearest neighbor. Each basic classifier is trained with different feature sets. Two datasets often used in prior works are selected to validate the performance of proposed approach. The results obtained by jackknife test are quite encouraging, indicating that the proposed method might become a potentially useful tool for subcellular location of apoptosis protein, or at least can play a complimentary role to the existing methods in the relevant areas. The supplement information and software written in Matlab are available by contacting the corresponding author.  相似文献   

11.
In this paper, a method for automatic construction of a fuzzy rule-based system from numerical data using the Incremental Learning Fuzzy Neural (ILFN) network and the Genetic Algorithm is presented. The ILFN network was developed for pattern classification applications. The ILFN network, which employed fuzzy sets and neural network theory, equips with a fast, one-pass, on-line, and incremental learning algorithm. After trained, the ILFN network stored numerical knowledge in hidden units, which can then be directly interpreted into if then rule bases. However, the rules extracted from the ILFN network are not in an optimized fuzzy linguistic form. In this paper, a knowledge base for fuzzy expert system is extracted from the hidden units of the ILFN classifier. A genetic algorithm is then invoked, in an iterative manner, to reduce number of rules and select only discriminate features from input patterns needed to provide a fuzzy rule-based system. Three computer simulations using a simulated 2-D 3-class data, the well-known Fisher's Iris data set, and the Wisconsin breast cancer data set were performed. The fuzzy rule-based system derived from the proposed method achieved 100% and 97.33% correct classification on the 75 patterns for training set and 75 patterns for test set, respectively. For the Wisconsin breast cancer data set, using 400 patterns for training and 299 patterns for testing, the derived fuzzy rule-based system achieved 99.5% and 98.33% correct classification on the training set and the test set, respectively.  相似文献   

12.

Objectives

Epidermal growth factor receptor (EGFR) gene mutations in tumors predict tumor response to EGFR tyrosine kinase inhibitors (EGFR-TKIs) in non-small-cell lung cancer (NSCLC). However, obtaining tumor tissue for mutation analysis is challenging. Here, we aimed to detect serum peptides/proteins associated with EGFR gene mutation status, and test whether a classification algorithm based on serum proteomic profiling could be developed to analyze EGFR gene mutation status to aid therapeutic decision-making.

Patients and Methods

Serum collected from 223 stage IIIB or IV NSCLC patients with known EGFR gene mutation status in their tumors prior to therapy was analyzed by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) and ClinProTools software. Differences in serum peptides/proteins between patients with EGFR gene TKI-sensitive mutations and wild-type EGFR genes were detected in a training group of 100 patients; based on this analysis, a serum proteomic classification algorithm was developed to classify EGFR gene mutation status and tested in an independent validation group of 123 patients. The correlation between EGFR gene mutation status, as identified with the serum proteomic classifier and response to EGFR-TKIs was analyzed.

Results

Nine peptide/protein peaks were significantly different between NSCLC patients with EGFR gene TKI-sensitive mutations and wild-type EGFR genes in the training group. A genetic algorithm model consisting of five peptides/proteins (m/z 4092.4, 4585.05, 1365.1, 4643.49 and 4438.43) was developed from the training group to separate patients with EGFR gene TKI-sensitive mutations and wild-type EGFR genes. The classifier exhibited a sensitivity of 84.6% and a specificity of 77.5% in the validation group. In the 81 patients from the validation group treated with EGFR-TKIs, 28 (59.6%) of 47 patients whose matched samples were labeled as “mutant” by the classifier and 3 (8.8%) of 34 patients whose matched samples were labeled as “wild” achieved an objective response (p<0.0001). Patients whose matched samples were labeled as “mutant” by the classifier had a significantly longer progression-free survival (PFS) than patients whose matched samples were labeled as “wild” (p=0.001).

Conclusion

Peptides/proteins related to EGFR gene mutation status were found in the serum. Classification of EGFR gene mutation status using the serum proteomic classifier established in the present study in patients with stage IIIB or IV NSCLC is feasible and may predict tumor response to EGFR-TKIs.  相似文献   

13.
A major challenge in biomedical studies in recent years has been the classification of gene expression profiles into categories, such as cases and controls. This is done by first training a classifier by using a labeled training set containing labeled samples from the two populations, and then using that classifier to predict the labels of new samples. Such predictions have recently been shown to improve the diagnosis and treatment selection practices for several diseases. This procedure is complicated, however, by the high dimensionality if the data. While microarrays can measure the levels of thousands of genes per sample, case-control microarray studies usually involve no more than several dozen samples. Standard classifiers do not work well in these situations where the number of features (gene expression levels measured in these microarrays) far exceeds the number of samples. Selecting only the features that are most relevant for discriminating between the two categories can help construct better classifiers, in terms of both accuracy and efficiency. In this work we developed a novel method for multivariate feature selection based on the Partial Least Squares algorithm. We compared the method''s variants with common feature selection techniques across a large number of real case-control datasets, using several classifiers. We demonstrate the advantages of the method and the preferable combinations of classifier and feature selection technique.  相似文献   

14.
The current study aimed to evaluate physical training effects. For this purpose, a classifier was implemented by taking into account biomechanical features selected from force-plate measurements and a neurofuzzy algorithm for data management and relevant decision-making. Measurements included two sets of sit-to-stand (STS) trials involving two homogeneous groups, experimental and control, of elders. They were carried out before and after a 12-week heavy resistance strength-training program undergone by the experimental group. Pre- and post-training differences were analysed, and percentages of membership to "trained" and "untrained" fuzzy sets calculated. The method was shown to be appropriate for detecting significant training-related changes. Detection accuracy was higher than 87%. Slightly weaker results were obtained using a neural approach, suggesting the need for a larger sample size. In conclusion, the use of a set of biomechanical features and of a neurofuzzy algorithm allowed to propose a global score for evaluating the effectiveness of a specific training program.  相似文献   

15.
It is a critical challenge to develop automated methods for fast and accurately determining the structures of proteins because of the increasingly widening gap between the number of sequence-known proteins and that of structure-known proteins in the post-genomic age. The knowledge of protein structural class can provide useful information towards the determination of protein structure. Thus, it is highly desirable to develop computational methods for identifying the structural classes of newly found proteins based on their primary sequence. In this study, according to the concept of Chou's pseudo amino acid composition (PseAA), eight PseAA vectors are used to represent protein samples. Each of the PseAA vectors is a 40-D (dimensional) vector, which is constructed by the conventional amino acid composition (AA) and a series of sequence-order correlation factors as original introduced by Chou. The difference among the eight PseAA representations is that different physicochemical properties are used to incorporate the sequence-order effects for the protein samples. Based on such a framework, a dual-layer fuzzy support vector machine (FSVM) network is proposed to predict protein structural classes. In the first layer of the FSVM network, eight FSVM classifiers trained by different PseAA vectors are established. The 2nd layer FSVM classifier is applied to reclassify the outputs of the first layer. The results thus obtained are quite promising, indicating that the new method may become a useful tool for predicting not only the structural classification of proteins but also their other attributes.  相似文献   

16.
In terms of making genes expression data more interpretable and comprehensible, there exists a significant superiority on sparse methods. Many sparse methods, such as penalized matrix decomposition (PMD) and sparse principal component analysis (SPCA), have been applied to extract plants core genes. Supervised algorithms, especially the support vector machine-recursive feature elimination (SVM-RFE) method, always have good performance in gene selection. In this paper, we draw into class information via the total scatter matrix and put forward a class-information-based penalized matrix decomposition (CIPMD) method to improve the gene identification performance of PMD-based method. Firstly, the total scatter matrix is obtained based on different samples of the gene expression data. Secondly, a new data matrix is constructed by decomposing the total scatter matrix. Thirdly, the new data matrix is decomposed by PMD to obtain the sparse eigensamples. Finally, the core genes are identified according to the nonzero entries in eigensamples. The results on simulation data show that CIPMD method can reach higher identification accuracies than the conventional gene identification methods. Moreover, the results on real gene expression data demonstrate that CIPMD method can identify more core genes closely related to the abiotic stresses than the other methods.  相似文献   

17.
To solve the class imbalance problem in the classification of pre-miRNAs with the ab initio method, we developed a novel sample selection method according to the characteristics of pre-miRNAs. Real/pseudo pre-miRNAs are clustered based on their stem similarity and their distribution in high dimensional sample space, respectively. The training samples are selected according to the sample density of each cluster. Experimental results are validated by the cross-validation and other testing datasets composed of human real/pseudo pre-miRNAs. When compared with the previous method, microPred, our classifier miRNAPred is nearly 12% more accurate. The selected training samples also could be used to train other SVM classifiers, such as triplet-SVM, MiPred, miPred, and microPred, to improve their classification performance. The sample selection algorithm is useful for constructing a more efficient classifier for the classification of real pre-miRNAs and pseudo hairpin sequences.  相似文献   

18.
BACKGROUND: Comparative genomic hybridization (CGH) is a relatively new molecular cytogenetic method for detecting chromosomal imbalance. Karyotyping of human metaphases is an important step to assign each chromosome to one of 23 or 24 classes (22 autosomes and two sex chromosomes). Automatic karyotyping in CGH analysis is needed. However, conventional karyotyping approaches based on DAPI images require complex image enhancement procedures. METHODS: This paper proposes a simple feature extraction method, one that generates density profiles from original true color CGH images and uses normalized profiles as feature vectors without quantization. A classifier is developed by using support vector machine (SVM). It has good generalization ability and needs only limited training samples. RESULTS: Experiment results show that the feature extraction method of using color information in CGH images can improve greatly the classification success rate. The SVM classifier is able to acquire knowledge about human chromosomes from relatively few samples and has good generalization ability. A success rate of moe than 90% has been achieved and the time for training and testing is very short. CONCLUSIONS: The feature extraction method proposed here and the SVM-based classifier offer a promising computerized intelligent system for automatic karyotyping of CGH human chromosomes.  相似文献   

19.
Ensemble classifier for protein fold pattern recognition   总被引:4,自引:0,他引:4  
MOTIVATION: Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each trained in different parameter systems, such as predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, as well as different dimensions of pseudo-amino acid composition, which were extracted from a training dataset. The operation engine for the constituent individual classifiers was OET-KNN (optimized evidence-theoretic k-nearest neighbors) rule. Their outcomes were combined through a weighted voting to give a final determination for classifying a query protein. The recognition was to find the true fold among the 27 possible patterns. RESULTS: The overall success rate thus obtained was 62% for a testing dataset where most of the proteins have <25% sequence identity with the proteins used in training the classifier. Such a rate is 6-21% higher than the corresponding rates obtained by various existing NN (neural networks) and SVM (support vector machines) approaches, implying that the ensemble classifier is very promising and might become a useful vehicle in protein science, as well as proteomics and bioinformatics. AVAILABILITY: The ensemble classifier, called PFP-Pred, is available as a web-server at http://202.120.37.186/bioinf/fold/PFP-Pred.htm for public usage.  相似文献   

20.
BACKGROUND: Artificial neural networks (ANNs) have been shown to be valuable in the analysis of analytical flow cytometric (AFC) data in aquatic ecology. Automated extraction of clusters is an important first stage in deriving ANN training data from field samples, but AFC data pose a number of challenges for many types of clustering algorithm. The fuzzy k-means algorithm recently has been extended to address nonspherical clusters with the use of scatter matrices. Four variants were proposed, each optimizing a different measure of clustering "goodness." METHODS: With AFC data obtained from marine phytoplankton species in culture, the four fuzzy k-means algorithm variants were compared with each other and with another multivariate clustering algorithm based on critical distances currently used in flow cytometry. RESULTS: One of the algorithm variants (adaptive distances, also known as the Gustafson--Kessel algorithm) was found to be robust and reliable, whereas the others showed various problems. CONCLUSIONS: The adaptive distances algorithm was superior in use to the clustering algorithms against which it was tested, but the problem of automatic determination of the number of clusters remains to be addressed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号