共查询到20条相似文献,搜索用时 0 毫秒
1.
The per capita ecological footprint (EF) is one of the most widely recognized measures of environmental sustainability. It aims to quantify the Earth's biological resources required to support human activity. In this paper, we summarize relevant previous literature, and present five factors that influence per capita EF. These factors are: National gross domestic product (GDP), urbanization (independent of economic development), distribution of income (measured by the Gini coefficient), export dependence (measured by the percentage of exports to total GDP), and service intensity (measured by the percentage of service to total GDP). A new ecological footprint model based on a support vector machine (SVM), which is a machine-learning method based on the structural risk minimization principle from statistical learning theory was conducted to calculate the per capita EF of 24 nations using data from 123 nations. The calculation accuracy was measured by average absolute error and average relative error. They were 0.004883 and 0.351078% respectively. Our results demonstrate that the EF model based on SVM has good calculation performance. 相似文献
2.
High-content screening studies of mitotic checkpoints are important for identifying cancer targets and developing novel cancer-specific therapies. A crucial step in such a study is to determine the stage of cell cycle. Due to the overwhelming number of cells assayed in a high-content screening experiment and the multiple factors that need to be taken into consideration for accurate determination of mitotic subphases, an automated classifier is necessary. In this article, the authors describe in detail a support vector machine (SVM) classifier that they have implemented to recognize various mitotic subphases. In contrast to previous studies to recognize subcellular patterns, they used only low-resolution cell images and a few parameters that can be calculated inexpensively with off-the-shelf image-processing software. The performance of the SVM was evaluated with a cross-validation method and was shown to be comparable to that of a human expert. 相似文献
3.
We developed a probability-based machine-learning program, Colander, to identify tandem mass spectra that are highly likely to represent phosphopeptides prior to database search. We identified statistically significant diagnostic features of phosphopeptide tandem mass spectra based on ion trap CID MS/MS experiments. Statistics for the features are calculated from 376 validated phosphopeptide spectra and 376 nonphosphopeptide spectra. A probability-based support vector machine (SVM) program, Colander, was then trained on five selected features. Data sets were assembled both from LC/LC-MS/MS analyses of large-scale phosphopeptide enrichments from proteolyzed cells, tissues and synthetic phosphopeptides. These data sets were used to evaluate the capability of Colander to select pS/pT-containing phosphopeptide tandem mass spectra. When applied to unknown tandem mass spectra, Colander can routinely remove 80% of tandem mass spectra while retaining 95% of phosphopeptide tandem mass spectra. The program significantly reduced computational time spent on database search by 60-90%. Furthermore, prefiltering tandem mass spectra representing phosphopeptides can increase the number of phosphopeptide identifications under a predefined false positive rate. 相似文献
4.
MOTIVATION: Numerous methods for predicting beta-turns in proteins have been developed based on various computational schemes. Here, we introduce a new method of beta-turn prediction that uses the support vector machine (SVM) algorithm together with predicted secondary structure information. Various parameters from the SVM have been adjusted to achieve optimal prediction performance. RESULTS: The SVM method achieved excellent performance as measured by the Matthews correlation coefficient (MCC = 0.45) using a 7-fold cross validation on a database of 426 non-homologous protein chains. To our best knowledge, this MCC value is the highest achieved so far for predicting beta-turn. The overall prediction accuracy Qtotal was 77.3%, which is the best among the existing prediction methods. Among its unique attractive features, the present SVM method avoids overtraining and compresses information and provides a predicted reliability index. 相似文献
5.
ESVM: evolutionary support vector machine for automatic feature selection and classification of microarray data 总被引:1,自引:0,他引:1
An optimal design of support vector machine (SVM)-based classifiers for prediction aims to optimize the combination of feature selection, parameter setting of SVM, and cross-validation methods. However, SVMs do not offer the mechanism of automatic internal relevant feature detection. The appropriate setting of their control parameters is often treated as another independent problem. This paper proposes an evolutionary approach to designing an SVM-based classifier (named ESVM) by simultaneous optimization of automatic feature selection and parameter tuning using an intelligent genetic algorithm, combined with k-fold cross-validation regarded as an estimator of generalization ability. To illustrate and evaluate the efficiency of ESVM, a typical application to microarray classification using 11 multi-class datasets is adopted. By considering model uncertainty, a frequency-based technique by voting on multiple sets of potentially informative features is used to identify the most effective subset of genes. It is shown that ESVM can obtain a high accuracy of 96.88% with a small number 10.0 of selected genes using 10-fold cross-validation for the 11 datasets averagely. The merits of ESVM are three-fold: (1) automatic feature selection and parameter setting embedded into ESVM can advance prediction abilities, compared to traditional SVMs; (2) ESVM can serve not only as an accurate classifier but also as an adaptive feature extractor; (3) ESVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of ESVM for bioinformatics problems. 相似文献
6.
Krause L McHardy AC Nattkemper TW Pühler A Stoye J Meyer F 《Nucleic acids research》2007,35(2):540-549
We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license. 相似文献
7.
8.
Protein S-nitrosylation plays a key and specific role in many cellular processes. Detecting possible S-nitrosylated substrates and their corresponding exact sites is crucial for studying the mechanisms of these biological processes. Comparing with the expensive and time-consuming biochemical experiments, the computational methods are attracting considerable attention due to their convenience and fast speed. Although some computational models have been developed to predict S-nitrosylation sites, their accuracy is still low. In this work,we incorporate support vector machine to predict protein S-nitrosylation sites. After a careful evaluation of six encoding schemes, we propose a new efficient predictor, CPR-SNO, using the coupling patterns based encoding scheme. The performance of our CPR-SNO is measured with the area under the ROC curve (AUC) of 0.8289 in 10-fold cross validation experiments, which is significantly better than the existing best method GPS-SNO 1.0's 0.685 performance. In further annotating large-scale potential S-nitrosylated substrates, CPR-SNO also presents an encouraging predictive performance. These results indicate that CPR-SNO can be used as a competitive protein S-nitrosylation sites predictor to the biological community. Our CPR-SNO has been implemented as a web server and is available at http://math.cau.edu.cn/CPR -SNO/CPR-SNO.html. 相似文献
9.
10.
PurposeAdaptive radiation therapy (ART) is an advanced field of radiation oncology. Image-guided radiation therapy (IGRT) methods can support daily setup and assess anatomical variations during therapy, which could prevent incorrect dose distribution and unexpected toxicities. A re-planning to correct these anatomical variations should be done daily/weekly, but to be applicable to a large number of patients, still require time consumption and resources. Using unsupervised machine learning on retrospective data, we have developed a predictive network, to identify patients that would benefit of a re-planning.Methods1200 MVCT of 40 head and neck (H&N) cases were re-contoured, automatically, using deformable hybrid registration and structures mapping. Deformable algorithm and MATLAB® homemade machine learning process, developed, allow prediction of criticalities for Tomotherapy treatments.ResultsUsing retrospective analysis of H&N treatments, we have investigated and predicted tumor shrinkage and organ at risk (OAR) deformations. Support vector machine (SVM) and cluster analysis have identified cases or treatment sessions with potential criticalities, based on dose and volume discrepancies between fractions. During 1st weeks of treatment, 84% of patients shown an output comparable to average standard radiation treatment behavior. Starting from the 4th week, significant morpho-dosimetric changes affect 77% of patients, suggesting need for re-planning. The comparison of treatment delivered and ART simulation was carried out with receiver operating characteristic (ROC) curves, showing monotonous increase of ROC area.ConclusionsWarping methods, supported by daily image analysis and predictive tools, can improve personalization and monitoring of each treatment, thereby minimizing anatomic and dosimetric divergences from initial constraints. 相似文献
11.
This paper presents some essential findings and results on using ranking-based kernels for the analysis and utilization of high dimensional and noisy biomedical data in applied clinical diagnostics. We claim that presented kernels combined with a state-of-the-art classification technique - a Support Vector Machine (SVM) - could significantly improve the classification rate and predictive power of the wrapper method, e.g. SVM. Moreover, the advantage of such kernels could be potentially exploited for other kernel methods and essential computer-aided tasks such as novelty detection and clustering. Our experimental results and theoretical generalization bounds imply that ranking-based kernels outperform other traditionally employed SVM kernels on high dimensional biomedical and microarray data. 相似文献
12.
Plewczynski D Tkacz A Wyrwicz LS Rychlewski L Ginalski K 《Journal of molecular modeling》2008,14(1):69-76
We present here the recent update of AutoMotif Server (AMS 2.0) that predicts post-translational modification sites in protein
sequences. The support vector machine (SVM) algorithm was trained on data gathered in 2007 from various sets of proteins containing
experimentally verified chemical modifications of proteins. Short sequence segments around a modification site were dissected
from a parent protein, and represented in the training set as binary or profile vectors. The updated efficiency of the SVM
classification for each type of modification and the predictive power of both representations were estimated using leave-one-out
tests for model of general phosphorylation and for modifications catalyzed by several specific protein kinases. The accuracy
of the method was improved in comparison to the previous version of the service (Plewczynski et al., “AutoMotif server: prediction
of single residue post-translational modifications in proteins”, Bioinformatics 21: 2525–7, 2005). The precision of the updated
version reached over 90% for selected types of phosphorylation and was optimized in trade of lower recall value of the classification
model. The AutoMotif Server version 2007 is freely available at . Additionally, the reference dataset for optimization of prediction of phosphorylation sites, collected from the UniProtKB
was also provided and can be accessed at . 相似文献
13.
Knowledge about protein-protein interactions (PPIs) unveils the molecular mechanisms of biological processes. However, the volume and content of published biomedical literature on protein interactions is expanding rapidly, making it increasingly difficult for interaction database curators to detect and curate protein interaction information manually. We present a multiple kernel learning-based approach for automatic PPI extraction from biomedical literature. The approach combines the following kernels: feature-based, tree, and graph and combines their output with Ranking support vector machine (SVM). Experimental evaluations show that the features in individual kernels are complementary and the kernel combined with Ranking SVM achieves better performance than those of the individual kernels, equal weight combination and optimal weight combination. Our approach can achieve state-of-the-art performance with respect to the comparable evaluations, with 64.88% F-score and 88.02% AUC on the AImed corpus. 相似文献
14.
Apoptosis, or programmed cell death, plays an important role in development of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on the concept that the position distribution information of amino acids is closely related with the structure and function of proteins, we introduce the concept of distance frequency [Matsuda, S., Vert, J.P., Ueda, N., Toh, H., Akutsu, T., 2005. A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci. 14, 2804-2813] and propose a novel way to calculate distance frequencies. In order to calculate the local features, each protein sequence is separated into p parts with the same length in our paper. Then we use the novel representation of protein sequences and adopt support vector machine to predict subcellular location. The overall prediction accuracy is significantly improved by jackknife test. 相似文献
15.
DNA methylation plays a key role in the regulation of gene expression. The most common type of DNA modification consists of the methylation of cytosine in the CpG dinucleotide. At the present time, there is no method available for the prediction of DNA methylation sites. Therefore, in this study we have developed a support vector machine (SVM)-based method for the prediction of cytosine methylation in CpG dinucleotides. Initially a SVM module was developed from human data for the prediction of human-specific methylation sites. This module achieved a MCC and AUC of 0.501 and 0.814, respectively, when evaluated using a 5-fold cross-validation. The performance of this SVM-based module was better than the classifiers built using alternative machine learning and statistical algorithms including artificial neural networks, Bayesian statistics, and decision trees. Additional SVM modules were also developed based on mammalian- and vertebrate-specific methylation patterns. The SVM module based on human methylation patterns was used for genome-wide analysis of methylation sites. This analysis demonstrated that the percentage of methylated CpGs is higher in UTRs as compared to exonic and intronic regions of human genes. This method is available on line for public use under the name of Methylator at http://bio.dfci.harvard.edu/Methylator/. 相似文献
16.
David R. Labbe Jacques A. de Guise Neila Mezghani Véronique Godbout Guy Grimard David Baillargeon Patrick Lavigne Julio Fernandes Pierre Ranger Nicola Hagemeister 《Journal of biomechanics》2011,44(1):1-5
The pivot shift test is the only clinical test that has been shown to correlate with subjective criteria of knee joint function following rupture of the anterior cruciate ligament. The grade of the pivot shift is important in predicting short- and long-term outcome. However, because this grade is established by a clinician in a subjective manner, the pivot shift’s value as a clinical tool is reduced. The purpose of this study was to develop a system that will objectively grade the pivot shift test based on recorded knee joint kinematics. Fifty-six subjects with different degrees of knee joint stability had the pivot shift test performed by one of eight different orthopaedic surgeons while their knee joint kinematics were recorded. A support vector machine based algorithm was used to objectively classify these recordings according to a clinical grade. The grades established by the surgeons were used as the gold standard for the development of the classifier. There was substantial agreement between our classifier and the surgeons in establishing the grade (weighted kappa=0.68). Seventy-one of 107 recordings (66%) were given the same grade and 96% of the time our classifier was within one grade of that given by the surgeons. Moreover, grades 0 and 1 were distinguished from grade 2 to 3 with 86% sensitivity and 90% specificity.Our results show the feasibility of automatically grading the pivot shift in a manner similar to that of an experienced clinician, based on knee joint kinematics. 相似文献
17.
A number of methods for predicting levels of solvent accessibility or accessible surface area (ASA) of amino acid residues in proteins have been developed. These methods either predict regularly spaced states of relative solvent accessibility or an analogue real value indicating relative solvent accessibility. While discrete states of exposure can be easily obtained by post prediction assignment of thresholds to the predicted or computed real values of ASA, the reverse, that is, obtaining a real value from quantized states of predicted ASA, is not straightforward as a two-state prediction in such cases would give a large real valued errors. However, prediction of ASA into larger number of ASA states and then finding a corresponding scheme for real value prediction may be helpful in integrating the two approaches of ASA prediction. We report a novel method of obtaining numerical real values of solvent accessibility, using accumulation cutoff set and support vector machine. This so-called SVM-Cabins method first predicts discrete states of ASA of amino acid residues from their evolutionary profile and then maps the predicted states onto a real valued linear space by simple algebraic methods. Resulting performance of such a rigorous approach using 13-state ASA prediction is at least comparable with the best methods of ASA prediction reported so far. The mean absolute error in this method reaches the best performance of 15.1% on the tested data set of 502 proteins with a coefficient of correlation equal to 0.66. Since, the method starts with the prediction of discrete states of ASA and leads to real value predictions, performance of prediction in binary states and real values are simultaneously optimized. 相似文献
18.
Fast Fourier transform-based support vector machine for subcellular localization prediction using different substitution models 总被引:2,自引:0,他引:2
There are approximately 109 proteins in a cell. A hotspot in bioinformatics is how to identify a protein's subcellular localization, if its sequence is known. In this paper, a method using fast Fourier transform-based support vector machine is developed to predict the subcellular localization of proteins from their physicochemical properties and structural parameters. The prediction accuracies reached 83% in prokaryotic organisms and 84% in eukaryotic organisms with the substitution model of the c-p-v matrix (c, composition; p, polarity; and v, molecular volume). The overall prediction accuracy was also evaluated using the "leave-one-out" jackknife procedure. The influence of the substitution model on prediction accuracy has also been discussed in the work. The source code of the new program is available on request from the authors. 相似文献
19.
As one important post-translational modification of prokaryotic proteins, pupylation plays a key role in regulating various biological processes. The accurate identification of pupylation sites is crucial for understanding the underlying mechanisms of pupylation. Although several computational methods have been developed for the identification of pupylation sites, the prediction accuracy of them is still unsatisfactory. Here, a novel bioinformatics tool named IMP–PUP is proposed to improve the prediction of pupylation sites. IMP–PUP is constructed on the composition of k-spaced amino acid pairs and trained with a modified semi-supervised self-training support vector machine (SVM) algorithm. The proposed algorithm iteratively trains a series of support vector machine classifiers on both annotated and non-annotated pupylated proteins. Computational results show that IMP–PUP achieves the area under receiver operating characteristic curves of 0.91, 0.73, and 0.75 on our training set, Tung's testing set, and our testing set, respectively, which are better than those of the different error costs SVM algorithm and the original self-training SVM algorithm. Independent tests also show that IMP–PUP significantly outperforms three other existing pupylation site predictors: GPS–PUP, iPUP, and pbPUP. Therefore, IMP–PUP can be a useful tool for accurate prediction of pupylation sites. A MATLAB software package for IMP–PUP is available at https://juzhe1120.github.io/. 相似文献
20.
A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset 总被引:5,自引:0,他引:5
Development of a robust and efficient approach for extracting useful information from microarray data continues to be a significant and challenging task. Microarray data are characterized by a high dimension, high signal-to-noise ratio, and high correlations between genes, but with a relatively small sample size. Current methods for dimensional reduction can further be improved for the scenario of the presence of a single (or a few) high influential gene(s) in which its effect in the feature subset would prohibit inclusion of other important genes. We have formalized a robust gene selection approach based on a hybrid between genetic algorithm and support vector machine. The major goal of this hybridization was to exploit fully their respective merits (e.g., robustness to the size of solution space and capability of handling a very large dimension of feature genes) for identification of key feature genes (or molecular signatures) for a complex biological phenotype. We have applied the approach to the microarray data of diffuse large B cell lymphoma to demonstrate its behaviors and properties for mining the high-dimension data of genome-wide gene expression profiles. The resulting classifier(s) (the optimal gene subset(s)) has achieved the highest accuracy (99%) for prediction of independent microarray samples in comparisons with marginal filters and a hybrid between genetic algorithm and K nearest neighbors. 相似文献