期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Glycosylation site prediction using ensembles of Support Vector Machine classifiers

Cornelia Caragea Jivko Sinapov Adrian Silvescu Drena Dobbs Vasant Honavar 《BMC bioinformatics》2007,8(1):438

Background

Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences. 相似文献

2.

Transcription-based prediction of response to IFNbeta using supervised computational methods

下载免费PDF全文

Baranzini SE Mousavi P Rio J Caillier SJ Stillman A Villoslada P Wyatt MM Comabella M Greller LD Somogyi R Montalban X Oksenberg JR 《PLoS biology》2005,3(1):e2

相似文献

3.

Lysine acetylation sites prediction using an ensemble of support vector machine classifiers 总被引：1，自引：0，他引：1

Yan Xu Xiao-Bo Wang Ling-Yun Wu 《Journal of theoretical biology》2010,264(1):130-99

Lysine acetylation is an essentially reversible and high regulated post-translational modification which regulates diverse protein properties. Experimental identification of acetylation sites is laborious and expensive. Hence, there is significant interest in the development of computational methods for reliable prediction of acetylation sites from amino acid sequences. In this paper we use an ensemble of support vector machine classifiers to perform this work. The experimentally determined acetylation lysine sites are extracted from Swiss-Prot database and scientific literatures. Experiment results show that an ensemble of support vector machine classifiers outperforms single support vector machine classifier and other computational methods such as PAIL and LysAcet on the problem of predicting acetylation lysine sites. The resulting method has been implemented in EnsemblePail, a web server for lysine acetylation sites prediction available at http://www.aporc.org/EnsemblePail/. 相似文献

4.

Robust two-gene classifiers for cancer prediction

Wang X 《Genomics》2012,99(2):90-95

Two-gene classifiers have attracted a broad interest for their simplicity and practicality. Most existing two-gene classification algorithms were involved in exhaustive search that led to their low time-efficiencies. In this study, we proposed two new two-gene classification algorithms which used simple univariate gene selection strategy and constructed simple classification rules based on optimal cut-points for two genes selected. We detected the optimal cut-point with the information entropy principle. We applied the two-gene classification models to eleven cancer gene expression datasets and compared their classification performance to that of some established two-gene classification models like the top-scoring pairs model and the greedy pairs model, as well as standard methods including Diagonal Linear Discriminant Analysis, k-Nearest Neighbor, Support Vector Machine and Random Forest. These comparisons indicated that the performance of our two-gene classifiers was comparable to or better than that of compared models. 相似文献

5.

RNA secondary structure prediction from sequence alignments using a network of k-nearest neighbor classifiers 总被引：3，自引：0，他引：3

下载免费PDF全文

Bindewald E Shapiro BA 《RNA (New York, N.Y.)》2006,12(3):342-352

We present a machine learning method (a hierarchical network of k-nearest neighbor classifiers) that uses an RNA sequence alignment in order to predict a consensus RNA secondary structure. The input to the network is the mutual information, the fraction of complementary nucleotides, and a novel consensus RNAfold secondary structure prediction of a pair of alignment columns and its nearest neighbors. Given this input, the network computes a prediction as to whether a particular pair of alignment columns corresponds to a base pair. By using a comprehensive test set of 49 RFAM alignments, the program KNetFold achieves an average Matthews correlation coefficient of 0.81. This is a significant improvement compared with the secondary structure prediction methods PFOLD and RNAalifold. By using the example of archaeal RNase P, we show that the program can also predict pseudoknot interactions. 相似文献

6.

A new avenue for classification and prediction of olive cultivars using supervised and unsupervised algorithms

AH Beiki S Saboor M Ebrahimi 《PloS one》2012,7(9):e44164

Various methods have been used to identify cultivares of olive trees; herein we used different bioinformatics algorithms to propose new tools to classify 10 cultivares of olive based on RAPD and ISSR genetic markers datasets generated from PCR reactions. Five RAPD markers (OPA0a21, OPD16a, OP01a1, OPD16a1 and OPA0a8) and five ISSR markers (UBC841a4, UBC868a7, UBC841a14, U12BC807a and UBC810a13) selected as the most important markers by all attribute weighting models. K-Medoids unsupervised clustering run on SVM dataset was fully able to cluster each olive cultivar to the right classes. All trees (176) induced by decision tree models generated meaningful trees and UBC841a4 attribute clearly distinguished between foreign and domestic olive cultivars with 100% accuracy. Predictive machine learning algorithms (SVM and Naïve Bayes) were also able to predict the right class of olive cultivares with 100% accuracy. For the first time, our results showed data mining techniques can be effectively used to distinguish between plant cultivares and proposed machine learning based systems in this study can predict new olive cultivars with the best possible accuracy. 相似文献

7.

Combining classifiers for HIV-1 drug resistance prediction

Srisawat A Kijsirikul B 《Protein and peptide letters》2008,15(5):435-442

This paper applies and studies the behavior of three learning algorithms, i.e. the Support Vector machine (SVM), the Radial Basis Function Network (the RBF network), and k-Nearest Neighbor (k-NN) for predicting HIV-1 drug resistance from genotype data. In addition, a new algorithm for classifier combination is proposed. The results of comparing the predictive performance of three learning algorithms show that, SVM yields the highest average accuracy, the RBF network gives the highest sensitivity, and k-NN yields the best in specificity. Finally, the comparison of the predictive performance of the composite classifier with three learning algorithms demonstrates that the proposed composite classifier provides the highest average accuracy. 相似文献

8.

Predicting subcellular localization of proteins using machine-learned classifiers 总被引：11，自引：0，他引：11

Lu Z Szafron D Greiner R Lu P Wishart DS Poulin B Anvik J Macdonell C Eisner R 《Bioinformatics (Oxford, England)》2004,20(4):547-556

MOTIVATION: Identifying the destination or localization of proteins is key to understanding their function and facilitating their purification. A number of existing computational prediction methods are based on sequence analysis. However, these methods are limited in scope, accuracy and most particularly breadth of coverage. Rather than using sequence information alone, we have explored the use of database text annotations from homologs and machine learning to substantially improve the prediction of subcellular location. RESULTS: We have constructed five machine-learning classifiers for predicting subcellular localization of proteins from animals, plants, fungi, Gram-negative bacteria and Gram-positive bacteria, which are 81% accurate for fungi and 92-94% accurate for the other four categories. These are the most accurate subcellular predictors across the widest set of organisms ever published. Our predictors are part of the Proteome Analyst web-service. 相似文献

9.

TargetSpy: a supervised machine learning approach for microRNA target prediction

Martin Sturm Michael Hackenberg David Langenberger Dmitrij Frishman 《BMC bioinformatics》2010,11(1):292

Background

Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. 相似文献

10.

Learning Bayesian network classifiers using ant colony optimization

Khalid M. Salama Alex A. Freitas 《Swarm Intelligence》2013,7(2-3):229-254

Bayesian networks are knowledge representation tools that model the (in)dependency relationships among variables for probabilistic reasoning. Classification with Bayesian networks aims to compute the class with the highest probability given a case. This special kind is referred to as Bayesian network classifiers. Since learning the Bayesian network structure from a dataset can be viewed as an optimization problem, heuristic search algorithms may be applied to build high-quality networks in medium- or large-scale problems, as exhaustive search is often feasible only for small problems. In this paper, we present our new algorithm, ABC-Miner, and propose several extensions to it. ABC-Miner uses ant colony optimization for learning the structure of Bayesian network classifiers. We report extended computational results comparing the performance of our algorithm with eight other classification algorithms, namely six variations of well-known Bayesian network classifiers, cAnt-Miner for discovering classification rules and a support vector machine algorithm. 相似文献

11.

Functional impact of missense variants in BRCA1 predicted by supervised learning

下载免费PDF全文

Karchin R Monteiro AN Tavtigian SV Carvalho MA Sali A 《PLoS computational biology》2007,3(2):e26

Many individuals tested for inherited cancer susceptibility at the BRCA1 gene locus are discovered to have variants of unknown clinical significance (UCVs). Most UCVs cause a single amino acid residue (missense) change in the BRCA1 protein. They can be biochemically assayed, but such evaluations are time-consuming and labor-intensive. Computational methods that classify and suggest explanations for UCV impact on protein function can complement functional tests. Here we describe a supervised learning approach to classification of BRCA1 UCVs. Using a novel combination of 16 predictive features, the algorithms were applied to retrospectively classify the impact of 36 BRCA1 C-terminal (BRCT) domain UCVs biochemically assayed to measure transactivation function and to blindly classify 54 documented UCVs. Majority vote of three supervised learning algorithms is in agreement with the assay for more than 94% of the UCVs. Two UCVs found deleterious by both the assay and the classifiers reveal a previously uncharacterized putative binding site. Clinicians may soon be able to use computational classifiers such as those described here to better inform patients. These classifiers can be adapted to other cancer susceptibility genes and systematically applied to prioritize the growing number of potential causative loci and variants found by large-scale disease association studies. 相似文献

12.

Document image binarisation using a supervised neural network

Khashman A Sekeroglu B 《International journal of neural systems》2008,18(5):405-418

Advances in digital technologies have allowed us to generate more images than ever. Images of scanned documents are examples of these images that form a vital part in digital libraries and archives. Scanned degraded documents contain background noise and varying contrast and illumination, therefore, document image binarisation must be performed in order to separate foreground from background layers. Image binarisation is performed using either local adaptive thresholding or global thresholding; with local thresholding being generally considered as more successful. This paper presents a novel method to global thresholding, where a neural network is trained using local threshold values of an image in order to determine an optimum global threshold value which is used to binarise the whole image. The proposed method is compared with five local thresholding methods, and the experimental results indicate that our method is computationally cost-effective and capable of binarising scanned degraded documents with superior results. 相似文献

13.

Improved prediction of malaria degradomes by supervised learning with SVM and profile kernel

Rui Kuang Jianying Gu Hong Cai Yufeng Wang 《Genetica》2009,137(2):243-243

相似文献

14.

Improved prediction of malaria degradomes by supervised learning with SVM and profile kernel

Rui Kuang Jianying Gu Hong Cai Yufeng Wang 《Genetica》2009,136(1):189-209

The spread of drug resistance through malaria parasite populations calls for the development of new therapeutic strategies. However, the seemingly promising genomics-driven target identification paradigm is hampered by the weak annotation coverage. To identify potentially important yet uncharacterized proteins, we apply support vector machines using profile kernels, a supervised discriminative machine learning technique for remote homology detection, as a complement to the traditional alignment based algorithms. In this study, we focus on the prediction of proteases, which have long been considered attractive drug targets because of their indispensable roles in parasite development and infection. Our analysis demonstrates that an abundant and complex repertoire is conserved in five Plasmodium parasite species. Several putative proteases may be important components in networks that mediate cellular processes, including hemoglobin digestion, invasion, trafficking, cell cycle fate, and signal transduction. This catalog of proteases provides a short list of targets for functional characterization and rational inhibitor design. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users. Rui Kuang and Jianying Gu have contributed equally to this work. An erratum to this article can be found at 相似文献

15.

A critical evaluation of network and pathway-based classifiers for outcome prediction in breast cancer

Staiger C Cadot S Kooter R Dittrich M Müller T Klau GW Wessels LF 《PloS one》2012,7(4):e34796

Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single genes classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single genes classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single genes classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single genes sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single genes classifiers for predicting outcome in breast cancer. 相似文献

16.

On the statistical assessment of classifiers using DNA microarray data

N Ancona R Maglietta A Piepoli A D'Addabbo R Cotugno M Savino S Liuni M Carella G Pesole F Perri 《BMC bioinformatics》2006,7(1):387-14

Background

In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data.

Results

We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set size increases, reaching its best performances with 35 training examples. In this case, RLS and SVM have error rates of e = 14% (p = 0.027) and e = 11% (p = 0.019). Concerning the number of genes, we found about 6000 genes (p < 0.05) correlated with the pathology, resulting from the signal-to-noise statistic. Moreover the performances of RLS and SVM classifiers do not change when 74% of genes is used. They progressively reduce up to e = 16% (p < 0.05) when only 2 genes are employed. The biological relevance of a set of genes determined by our statistical analysis and the major roles they play in colorectal tumorigenesis is discussed.

Conclusions

The method proposed provides statistically significant answers to precise questions relevant for the diagnosis and prognosis of cancer. We found that, with as few as 15 examples, it is possible to train statistically significant classifiers for colon cancer diagnosis. As for the definition of the number of genes sufficient for a reliable classification of colon cancer, our results suggest that it depends on the accuracy required. 相似文献

17.

Functional lipids and lipoplexes for improved gene delivery

Zhang XX McIntosh TJ Grinstaff MW 《Biochimie》2012,94(1):42-58

相似文献

18.

Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. 总被引：84，自引：0，他引：84

Margaret A Shipp Ken N Ross Pablo Tamayo Andrew P Weng Jeffery L Kutok Ricardo C T Aguiar Michelle Gaasenbeek Michael Angelo Michael Reich Geraldine S Pinkus Tane S Ray Margaret A Koval Kim W Last Andrew Norton T Andrew Lister Jill Mesirov Donna S Neuberg Eric S Lander Jon C Aster Todd R Golub 《Nature medicine》2002,8(1):68-74

Diffuse large B-cell lymphoma (DLBCL), the most common lymphoid malignancy in adults, is curable in less than 50% of patients. Prognostic models based on pre-treatment characteristics, such as the International Prognostic Index (IPI), are currently used to predict outcome in DLBCL. However, clinical outcome models identify neither the molecular basis of clinical heterogeneity, nor specific therapeutic targets. We analyzed the expression of 6,817 genes in diagnostic tumor specimens from DLBCL patients who received cyclophosphamide, adriamycin, vincristine and prednisone (CHOP)-based chemotherapy, and applied a supervised learning prediction method to identify cured versus fatal or refractory disease. The algorithm classified two categories of patients with very different five-year overall survival rates (70% versus 12%). The model also effectively delineated patients within specific IPI risk categories who were likely to be cured or to die of their disease. Genes implicated in DLBCL outcome included some that regulate responses to B-cell-receptor signaling, critical serine/threonine phosphorylation pathways and apoptosis. Our data indicate that supervised learning classification techniques can predict outcome in DLBCL and identify rational targets for intervention. 相似文献

19.

Index of unidentified ectomycorrhizae

R. Agerer 《Mycorrhiza》1994,4(4):183-184

相似文献

20.

Index of unidentified ectomycorrhizae

R. Agerer 《Mycorrhiza》1993,2(4):183-183

相似文献