The use of antigenicity scales based on physicochemical properties and the sliding window method in combination with an averaging algorithm and subsequent search for the maximum value is the classical method for B-cell epitope prediction. However, recent studies have demonstrated that the best classical methods provide a poor correlation with experimental data. We review both classical and novel algorithms and present our own implementation of the algorithms. The AAPPred software is available at http://www.bioinf.ru/aappred/. 相似文献
Gram-negative bacteria have five major subcellular localization sites: the cytoplasm, the periplasm, the inner membrane, the outer membrane, and the extracellular space. The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. We present an approach to predict subcellular localization for Gram-negative bacteria. This method uses the support vector machines trained by multiple feature vectors based on n-peptide compositions. For a standard data set comprising 1443 proteins, the overall prediction accuracy reaches 89%, which, to the best of our knowledge, is the highest prediction rate ever reported. Our prediction is 14% higher than that of the recently developed multimodular PSORT-B. Because of its simplicity, this approach can be easily extended to other organisms and should be a useful tool for the high-throughput and large-scale analysis of proteomic and genomic data. 相似文献
Despite the clear differences between the amino acid sequence and enzymatic specificity of aspartic and cysteine endopeptidases, the biosynthetic processing of lysosomal members of these two families is very similar. With in vitro translation and pulse-chase analysis in tissue culture cells, the biosynthesis of cathepsin D, a aspartic protease, and cathepsins B, H and L, cysteine proteases, are compared. Both aspartic and cysteine endopeptidases undergo cotranslational cleavage of an amino-terminal signal peptide that mediates transport across the endoplasmic reticulum (ER) membrane. Addition of high-mannose carbohydrate also occurs cotranslationally in the lumen of the ER. Proteases of both enzyme classes are initially synthesized as inactive proenzymes possessing amino-terminal activation peptides. Removal of the propeptide generates an active single-chain enzyme. Whether the single-chain enzyme undergoes asymmetric cleavage into a light and a heavy chain appears to be cell type specific. Finally, late during their biosynthesis both classes of enzymes undergo amino acid trimming, losing a few amino acid residues at the cleavage site between the light and heavy chains and/or at their carboxyltermini. During biosynthesis these enzymes are also secreted to some extent. In most cells the secreted enzyme is the proenzyme bearing some complex carbohydrate. Under certain physiological conditions the inactive secreted enzymes may become activated as a result of a conformational change that may or may not result in autolysis. Analysis of the biochemical nature of the various processing steps helps define the cellular pathway followed by newly synthesized proteases targeted to the lysosome. 相似文献
An inherent problem in transmembrane protein topology prediction and signal peptide prediction is the high similarity between the hydrophobic regions of a transmembrane helix and that of a signal peptide, leading to cross-reaction between the two types of predictions. To improve predictions further, it is therefore important to make a predictor that aims to discriminate between the two classes. In addition, topology information can be gained when successfully predicting a signal peptide leading a transmembrane protein since it dictates that the N terminus of the mature protein must be on the non-cytoplasmic side of the membrane. Here, we present Phobius, a combined transmembrane protein topology and signal peptide predictor. The predictor is based on a hidden Markov model (HMM) that models the different sequence regions of a signal peptide and the different regions of a transmembrane protein in a series of interconnected states. Training was done on a newly assembled and curated dataset. Compared to TMHMM and SignalP, errors coming from cross-prediction between transmembrane segments and signal peptides were reduced substantially by Phobius. False classifications of signal peptides were reduced from 26.1% to 3.9% and false classifications of transmembrane helices were reduced from 19.0% to 7.7%. Phobius was applied to the proteomes of Homo sapiens and Escherichia coli. Here we also noted a drastic reduction of false classifications compared to TMHMM/SignalP, suggesting that Phobius is well suited for whole-genome annotation of signal peptides and transmembrane regions. The method is available at as well as at 相似文献
Bcs1 is a transmembrane chaperone in the mitochondrial inner membrane, and is required for the mitochondrial Respiratory Chain Complex III assembly. It has been shown that the highly-conserved C-terminal region of Bcs1 including the AAA ATPase domain in the matrix side is essential for the chaperone function. Here we describe the importance of the N-terminal short segment located in the intermembrane space in the Bcs1 function. Among the N-terminal 44 amino acid residues of yeast Bcs1, the first 37 residues are dispensable whereas a hydrophobic amino acid in the residue 38 is essential for integration of Rieske Iron-sulfur Protein into the premature Complex III from the mitochondrial matrix. Substitution of the residue 38 by a hydrophilic amino acid residue affects conformation of Bcs1 and interactions with other proteins. The evolutionarily-conserved short α helix of Bcs1 in the intermembrane space is an essential element for the chaperone function. 相似文献
Introduction: Despite the unquestionable advantages of Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry Imaging in visualizing the spatial distribution and the relative abundance of biomolecules directly on-tissue, the yielded data is complex and high dimensional. Therefore, analysis and interpretation of this huge amount of information is mathematically, statistically and computationally challenging.
Areas covered: This article reviews some of the challenges in data elaboration with particular emphasis on machine learning techniques employed in clinical applications, and can be useful in general as an entry point for those who want to study the computational aspects. Several characteristics of data processing are described, enlightening advantages and disadvantages. Different approaches for data elaboration focused on clinical applications are also provided. Practical tutorial based upon Orange Canvas and Weka software is included, helping familiarization with the data processing.
Expert commentary: Recently, MALDI-MSI has gained considerable attention and has been employed for research and diagnostic purposes, with successful results. Data dimensionality constitutes an important issue and statistical methods for information-preserving data reduction represent one of the most challenging aspects. The most common data reduction methods are characterized by collecting independent observations into a single table. However, the incorporation of relational information can improve the discriminatory capability of the data. 相似文献
ObjectivesWhen the prognosis of COVID-19 disease can be detected early, the intense-pressure and loss of workforce in health-services can be partially reduced. The primary-purpose of this article is to determine the feature-dataset consisting of the routine-blood-values (RBV) and demographic-data that affect the prognosis of COVID-19. Second, by applying the feature-dataset to the supervised machine-learning (ML) models, it is to identify severely and mildly infected COVID-19 patients at the time of admission.Material and methodsThe sample of this study consists of severely (n = 192) and mildly (n = 4010) infected-patients hospitalized with the diagnosis of COVID-19 between March-September, 2021. The RBV-data measured at the time of admission and age-gender characteristics of these patients were analyzed retrospectively. For the selection of the features, the minimum-redundancy-maximum-relevance (MRMR) method, principal-components-analysis and forward-multiple-logistics-regression analyzes were used. The features set were statistically compared between mild and severe infected-patients. Then, the performances of various supervised-ML-models were compared in identifying severely and mildly infected-patients using the feature set.ResultsIn this study, 28 RBV-parameters and age-variable were found as the feature-dataset. The effect of features on the prognosis of the disease has been clinically proven. The ML-models with the highest overall-accuracy in identifying patient-groups were found respectively, as follows: local-weighted-learning (LWL)-97.86%, K-star (K*)-96.31%, Naive-Bayes (NB)-95.36% and k-nearest-neighbor (KNN)-94.05%. Also, the most successful models with the highest area-under-the-receiver-operating-characteristic-curve (AUC) values in identifying patient groups were found respectively, as follows: LWL-0.95%, K*-0.91%, NB-0.85% and KNN-0.75%.ConclusionThe findings in this article have significant a motivation for the healthcare professionals to detect at admission severely and mildly infected COVID-19 patients. 相似文献
Proteins play important roles in living organisms, and their function is directly linked with their structure. Due to the growing gap between the number of proteins being discovered and their functional characterization (in particular as a result of experimental limitations), reliable prediction of protein function through computational means has become crucial. This paper reviews the machine learning techniques used in the literature, following their evolution from simple algorithms such as logistic regression to more advanced methods like support vector machines and modern deep neural networks. Hyperparameter optimization methods adopted to boost prediction performance are presented. In parallel, the metamorphosis in the features used by these algorithms from classical physicochemical properties and amino acid composition, up to text-derived features from biomedical literature and learned feature representations using autoencoders, together with feature selection and dimensionality reduction techniques, are also reviewed. The success stories in the application of these techniques to both general and specific protein function prediction are discussed. 相似文献