首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

We apply a new machine learning method, the so-called Support Vector Machine method, to predict the protein structural class. Support Vector Machine method is performed based on the database derived from SCOP, in which protein domains are classified based on known structures and the evolutionary relationships and the principles that govern their 3-D structure.  相似文献   

2.
Studies suggested that the pathogenesis of inflammatory breast cancer (IBC) is related to inflammatory manifestations accompanied by specific cellular and molecular mechanisms in the IBC tumor microenvironment (TME). IBC is characterized by significantly higher infiltration of tumor-associated macrophages (TAMs) that contribute to its metastatic process via secreting many cytokines such as TNF, IL-6, IL-8, and IL-10 that enhance invasion and angiogenesis. Thus, there is a need to first understand how IBC-TME modulates the polarization of TAMs to better understand the role of TAMs in IBC. Herein, we used gene expression signature and Synchrotron Fourier-Transform Infrared Microspectroscopy (SR-μFTIR) to study the molecular and biochemical changes, respectively of in vitro polarized TAMs stimulated by the secretome of IBC and non-IBC cells. The gene expression signature showed significant differences in the macrophage's polarization-related genes between stimulated TAMs. FTIR spectra showed absorption bands in the region of 1700–1500 cm?1 attributed to the amide I ν(C=O), & νAS (CN), δ (NH), and amide II ν(CN), δ (NH) proteins bands. Moreover, three peaks of different intensities and areas were detected in the lipid region of the νCH2 and νCH3 stretching modes positioned within the 3000–2800 cm?1 range. The PCA analysis for the second derivative spectra of the amide regions discriminates between stimulated IBC and non-IBC TAMs. This study showed that IBC and non-IBC TMEs differentially modulate the polarization of TAMs and SR-μFTIR can determine these biochemical changes which will help to better understand the potential role of TAMs in IBC.  相似文献   

3.
Prostate cancer is the most common cancer in men over 50 years of age and it has been shown that nuclear magnetic resonance spectra are sensitive enough to distinguish normal and cancer tissues. In this paper, we propose a classification technique of spectra from magnetic resonance spectroscopy. We studied automatic classification with and without quantification of metabolite signals. The dataset is composed of 22 patient datasets with a biopsy-proven cancer, from which we extracted 2464 spectra from the whole prostate and of which 1062 were localised in the peripheral zone. The spectra were manually classed into 3 different categories by a spectroscopist with 4 years experience in clinical spectroscopy of prostate cancer: undetermined, healthy and pathologic. We used different preprocessing methods (module, phase correction only, phase correction and baseline correction) as input for Support Vector Machine and for Multilayer Perceptron, and we compared the results with those from the expert. If we class only healthy and pathologic spectra we reach a total error rate of 4.51%. However, if we class all spectra (undetermined, healthy and pathologic) the total error rate rises to 11.49%. We have shown in this paper that the best results are obtained using the pre-processed spectra without quantification as input for the classifiers and we confirm that Support Vector Machine are more efficient than Multilayer Perceptron in processing high dimensional data.  相似文献   

4.

Background  

RNA interference (RNAi) is a naturally occurring phenomenon that results in the suppression of a target RNA sequence utilizing a variety of possible methods and pathways. To dissect the factors that result in effective siRNA sequences a regression kernel Support Vector Machine (SVM) approach was used to quantitatively model RNA interference activities.  相似文献   

5.
Comprehensive knowledge of thermophilic mechanisms about some organisms whose optimum growth temperature (OGT) ranges from 50 to 80 °C degree plays a major role for helping to design stable proteins. How to predict function-unknown proteins to be thermophilic is a long but not fairly resolved problem. Chaos game representation (CGR) can investigate hidden patterns in protein sequences, and also can visually reveal their previously unknown structures. In this paper, using the general form of pseudo amino acid composition to represent protein samples, we proposed a novel method for presenting protein sequence to a CGR picture using CGR algorithm. A 24-dimensional vector extracted from these CGR segments and the first two PCA features are used to classify thermophilic and mesophilic proteins by Support Vector Machine (SVM). Our method is evaluated by the jackknife test. For the 24-dimensional vector, the accuracy is 0.8792 and Matthews Correlation Coefficient (MCC) is 0.7587. The 26-dimensional vector by hybridizing with PCA components performs highly satisfaction, in which the accuracy achieves 0.9944 and MCC achieves 0.9888. The results show the effectiveness of the new hybrid method.  相似文献   

6.
Surface‐enhanced Raman spectroscopy (SERS) is garnering considerable attention for the swift diagnosis of pathogens and abnormal biological status, that is, cancers. In this work, a simple, fast and inexpensive optical sensing platform is developed by the design of SERS sampling and data analysis. The pretreatment of spectral measurement employed gold nanoparticle colloid mixing with the serum from patients with colorectal cancer (CRC). The droplet of particle‐serum mixture formed coffee‐ring‐like region at the rim, providing strong and stable SERS profiles. The obtained spectra from cancer patients and healthy volunteers were analyzed by unsupervised principal component analysis (PCA) and supervised machine learning model, such as support‐vector machine (SVM), respectively. The results demonstrate that the SVM model provides the superior performance in the classification of CRC diagnosis compared with PCA. In addition, the values of carcinoembryonic antigen from the blood samples were compiled with the corresponding SERS spectra for SVM calculation, yielding improved prediction results.  相似文献   

7.

Background  

DNA repair is the general term for the collection of critical mechanisms which repair many forms of DNA damage such as methylation or ionizing radiation. DNA repair has mainly been studied in experimental and clinical situations, and relatively few information-based approaches to new extracting DNA repair knowledge exist. As a first step, automatic detection of DNA repair proteins in genomes via informatics techniques is desirable; however, there are many forms of DNA repair and it is not a straightforward process to identify and classify repair proteins with a single optimal method. We perform a study of the ability of homology and machine learning-based methods to identify and classify DNA repair proteins, as well as scan vertebrate genomes for the presence of novel repair proteins. Combinations of primary sequence polypeptide frequency, secondary structure, and homology information are used as feature information for input to a Support Vector Machine (SVM).  相似文献   

8.
Malignant tumors have high metabolic and perfusion rates, which result in a unique temperature distribution as compared to healthy tissues. Here, we sought to characterize the thermal response of the cervix following brachytherapy in women with advanced cervical carcinoma. Six patients underwent imaging with a thermal camera before a brachytherapy treatment session and after a 7-day follow-up period. A designated algorithm was used to calculate and store the texture parameters of the examined tissues across all time points. We used supervised machine learning classification methods (K Nearest Neighbors and Support Vector Machine) and unsupervised machine learning classification (K-means). Our algorithms demonstrated a 100% detection rate for physiological changes in cervical tumors before and after brachytherapy. Thus, we showed that thermal imaging combined with advanced feature extraction could potentially be used to detect tissue-specific changes in the cervix in response to local brachytherapy for cervical cancer.  相似文献   

9.
Learning to categorise sensory inputs by generalising from a few examples whose category is precisely known is a crucial step for the brain to produce appropriate behavioural responses. At the neuronal level, this may be performed by adaptation of synaptic weights under the influence of a training signal, in order to group spiking patterns impinging on the neuron. Here we describe a framework that allows spiking neurons to perform such “supervised learning”, using principles similar to the Support Vector Machine, a well-established and robust classifier. Using a hinge-loss error function, we show that requesting a margin similar to that of the SVM improves performance on linearly non-separable problems. Moreover, we show that using pools of neurons to discriminate categories can also increase the performance by sharing the load among neurons.  相似文献   

10.
Apoptosis proteins have a central role in the development and homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death, and their function is related to their types. According to the classification scheme by Zhou and Doctor (2003), the apoptosis proteins are categorized into the following four types: (1) cytoplasmic protein; (2) plasma membrane-bound protein; (3) mitochondrial inner and outer proteins; (4) other proteins. A powerful learning machine, the Support Vector Machine, is applied for predicting the type of a given apoptosis protein by incorporating the sqrt-amino acid composition effect. High success rates were obtained by the re-substitute test (98/98 = 100 %) and the jackknife test (89/98 = 90.8%).  相似文献   

11.
This study reports a preparation of silver nanoparticles (SNPs) using Microsorum pteropus methanol extract, as a new approach in the development of therapeutic strategies against diseases caused by oxidative stress, reactive oxygen, and nitrogen species. During the effort of extraction and isolation from M. pteropus, X-ray single-crystal structural analysis of sucrose was succeeded. 1,1-Diphenyl-2-picrylhydrazyl (DPPH) and hydrogen peroxide scavenging assay were used to confirm the antioxidant potential. Preparation of SNPs was confirmed by ultraviolet–visible (UV–Vis) spectra with peaks between 431 and 436 nm. Infrared (IR) analysis showed OH, NH functional groups of alcohol, phenol, amine, and aliphatic CH stretching vibrations of hydrocarbon chains of the synthesized nanoparticles. The antioxidant properties of the SNPs significantly showed DPPH reduction with an IC₅₀ value of 47.0 µg/mL and hydrogen peroxide scavenging activity with an IC₅₀ value of 35.8 µg/mL, and hence, indicating their capability to eliminate potentially damaging oxidants involved in oxidative stress and their related diseases.  相似文献   

12.
The annotation of the well-studied organism, Saccharomyces cerevisiae, has been improving over the past decade while there are unresolved debates over the amount of biologically significant open reading frames (ORFs) in yeast genome. We revisited the total count of protein-coding genes in S. cerevisiae S288c genome using a theoretical approach by combining the Support Vector Machine (SVM) method with six widely used measurements of sequence statistical features. The accuracy of our method is over 99.5% in 10-fold cross-validation. Based on the annotation data in Saccharomyces Genome Database (SGD), we studied the coding capacity of all 1744 ORFs which lack experimental results and suggested that the overall number of chromosomal ORFs encoding proteins in yeast should be 6091 by removing 488 spurious ORFs. The importance of the present work lies in at least two aspects. First, cross-validation and retrospective examination showed the fidelity of our method in recognizing ORFs that likely encode proteins. Second, we have provided a web service that can be accessed at http://cobi.uestc.edu.cn/services/yeast/, which enables the prediction of protein-coding ORFs of the genus Saccharomyces with a high accuracy.  相似文献   

13.
14.
Prokaryotic proteins are regulated by pupylation, a type of post-translational modification that contributes to cellular function in bacterial organisms. In pupylation process, the prokaryotic ubiquitin-like protein (Pup) tagging is functionally analogous to ubiquitination in order to tag target proteins for proteasomal degradation. To date, several experimental methods have been developed to identify pupylated proteins and their pupylation sites, but these experimental methods are generally laborious and costly. Therefore, computational methods that can accurately predict potential pupylation sites based on protein sequence information are highly desirable. In this paper, a novel predictor termed as pbPUP has been developed for accurate prediction of pupylation sites. In particular, a sophisticated sequence encoding scheme [i.e. the profile-based composition of k-spaced amino acid pairs (pbCKSAAP)] is used to represent the sequence patterns and evolutionary information of the sequence fragments surrounding pupylation sites. Then, a Support Vector Machine (SVM) classifier is trained using the pbCKSAAP encoding scheme. The final pbPUP predictor achieves an AUC value of 0.849 in10-fold cross-validation tests and outperforms other existing predictors on a comprehensive independent test dataset. The proposed method is anticipated to be a helpful computational resource for the prediction of pupylation sites. The web server and curated datasets in this study are freely available at http://protein.cau.edu.cn/pbPUP/.  相似文献   

15.
Historically, probabilistic models for decision support have focused on discrimination, e.g., minimizing the ranking error of predicted outcomes. Unfortunately, these models ignore another important aspect, calibration, which indicates the magnitude of correctness of model predictions. Using discrimination and calibration simultaneously can be helpful for many clinical decisions. We investigated tradeoffs between these goals, and developed a unified maximum-margin method to handle them jointly. Our approach called, Doubly Optimized Calibrated Support Vector Machine (DOC-SVM), concurrently optimizes two loss functions: the ridge regression loss and the hinge loss. Experiments using three breast cancer gene-expression datasets (i.e., GSE2034, GSE2990, and Chanrion''s datasets) showed that our model generated more calibrated outputs when compared to other state-of-the-art models like Support Vector Machine ( = 0.03,  = 0.13, and <0.001) and Logistic Regression ( = 0.006,  = 0.008, and <0.001). DOC-SVM also demonstrated better discrimination (i.e., higher AUCs) when compared to Support Vector Machine ( = 0.38,  = 0.29, and  = 0.047) and Logistic Regression ( = 0.38,  = 0.04, and <0.0001). DOC-SVM produced a model that was better calibrated without sacrificing discrimination, and hence may be helpful in clinical decision making.  相似文献   

16.
《BBA》2020,1861(5-6):148173
Infrared absorption bands associated with the neutral state of quinones in the A1 binding site in photosystem I (PSI) have been difficult to identify in the past. This problem is addressed here, where time-resolved step-scan FTIR difference spectroscopy at 77 K has been used to study PSI with six different quinones incorporated into the A1 binding site. (P700+A1 – P700A1) and (A1 – A1) FTIR difference spectra (DS) were obtained for PSI with the different quinones incorporated, and several double-difference spectra (DDS) were constructed from the DS. From analysis of the DS and DDS, in combination with density functional theory based vibrational frequency calculations of the quinones, the neutral state bands of the incorporated quinones are identified and assigned. For neutral PhQ in the A1 binding site, infrared absorption bands were identified near 1665 and 1635 cm−1, that are due to the C1O and C4O stretching vibrations of the incorporated PhQ, respectively. These assignments indicate a 30 cm−1 separation between the C1O and C4O modes, considerably less than the ~80 cm−1 found for similar modes of PhQ. The C4O mode downshifts due to hydrogen bonding, so the suggestion is that hydrogen bonding is weaker for the neutral state compared to the anion state, indicating radical-induced proton dynamics associated with the quinone in the A1 binding site in PSI.  相似文献   

17.

Background  

The ability to distinguish between genes and proteins is essential for understanding biological text. Support Vector Machines (SVMs) have been proven to be very efficient in general data mining tasks. We explore their capability for the gene versus protein name disambiguation task.  相似文献   

18.
Automatic text categorization is one of the key techniques in information retrieval and the data mining field. The classification is usually time-consuming when the training dataset is large and high-dimensional. Many methods have been proposed to solve this problem, but few can achieve satisfactory efficiency. In this paper, we present a method which combines the Latent Dirichlet Allocation (LDA) algorithm and the Support Vector Machine (SVM). LDA is first used to generate reduced dimensional representation of topics as feature in VSM. It is able to reduce features dramatically but keeps the necessary semantic information. The Support Vector Machine (SVM) is then employed to classify the data based on the generated features. We evaluate the algorithm on 20 Newsgroups and Reuters-21578 datasets, respectively. The experimental results show that the classification based on our proposed LDA+SVM model achieves high performance in terms of precision, recall and F1 measure. Further, it can achieve this within a much shorter time-frame. Our process improves greatly upon the previous work in this field and displays strong potential to achieve a streamlined classification process for a wide range of applications.  相似文献   

19.
Identifying the subcellular localization of proteins is particularly helpful in the functional annotation of gene products. In this study, we use Machine Learning and Exploratory Data Analysis (EDA) techniques to examine and characterize amino acid sequences of human proteins localized in nine cellular compartments. A dataset of 3,749 protein sequences representing human proteins was extracted from the SWISS-PROT database. Feature vectors were created to capture specific amino acid sequence characteristics. Relative to a Support Vector Machine, a Multi-layer Perceptron, and a Naive Bayes classifier, the C4.5 Decision Tree algorithm was the most consistent performer across all nine compartments in reliably predicting the subcellular localization of proteins based on their amino acid sequences (average Precision=0.88; average Sensitivity=0.86). Furthermore, EDA graphics characterized essential features of proteins in each compartment. As examples, proteins localized to the plasma membrane had higher proportions of hydrophobic amino acids; cytoplasmic proteins had higher proportions of neutral amino acids; and mitochondrial proteins had higher proportions of neutral amino acids and lower proportions of polar amino acids. These data showed that the C4.5 classifier and EDA tools can be effective for characterizing and predicting the subcellular localization of human proteins based on their amino acid sequences.  相似文献   

20.
BackgroundToxoplasmosis as a global disease is considered as a triggering factor responsible for development of several clinical diseases. However, Toxoplasma gondii (T. gondii) is an understudied parasite of potential interest in obesity research. The current study aimed to explore the role of latent T. gondii infection in the pathogenesis of metabolic syndrome (MetS) in obese adolescents through studying the relationship between serum interferon-gamma [IFN-γ] and serum chemerin in context of MetS components.MethodsEighty-three obese adolescents were serologically screened for T. gondii-IgG antibodies and compared to 35 age-matched healthy T. gondii-seronegative controls. Participants were evaluated for anthropometric measurements, total-fat mass [FM], trunk-FM, serum lipid profile, IFN-γ, and chemerin levels. Homeostatic Model Assessment of insulin resistance (HOMA-IR) was calculated.ResultsThe prevalence of MetS was significantly higher within obese T. gondii-seropositive group compared to obese T. gondii-seronegative group (P = 0.033). Seropositive obese MetS group displayed significantly higher trunk-FM, HOMA-IR, chemerin, and IFN-γ compared to seronegative obese MetS group. Serum chemerin and IFN-γ were strongly correlated (P < 0.001) and were positively correlated with BMI, WC, total-FM, trunk-FM, HOMA-IR, cholesterol, triglycerides and negatively correlated with HDLC. HOMA-IR was a common predictor for serum chemerin (P = 0.030) and IFN-γ (P < 0.001).ConclusionsThe study results suggest that T. gondii infection may exert an immune-metabolic effect that may have a potential role in the development of MetS among obese adolescents.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号