首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The pseudo amino acid (PseAA) composition can represent a protein sequence in a discrete model without completely losing its sequence-order information, and hence has been widely applied for improving the prediction quality for various protein attributes. However, dealing with different problems may need different kinds of PseAA composition. Here, we present a web-server called PseAAC at http://chou.med.harvard.edu/bioinf/PseAA/, by which users can generate various kinds of PseAA composition to best fit their need.  相似文献   

2.
G protein-coupled receptors (GPCRs) are among the most frequent targets of therapeutic drugs. With the avalanche of newly generated protein sequences in the post genomic age, to expedite the process of drug discovery, it is highly desirable to develop an automated method to rapidly identify GPCRs and their types. A new predictor was developed by hybridizing two different modes of pseudo-amino acid composition (PseAAC): the functional domain PseAAC and the low-frequency Fourier spectrum PseAAC. The new predictor is called GPCR-2L, where "2L" means that it is a two-layer predictor: the 1st layer prediction engine is to identify a query protein as GPCR or not; if it is, the prediction will be automatically continued to further identify it as belonging to one of the following six types: (1) rhodopsin-like (Class A), (2) secretin-like (Class B), (3) metabotropic glutamate/pheromone (Class C), (4) fungal pheromone (Class D), (5) cAMP receptor (Class E), or (6) frizzled/smoothened family (Class F). The overall success rate of GPCR-2L in identifying proteins as GPCRs or non-GPCRs is over 97.2%, while identifying GPCRs among their six types is over 97.8%. Such high success rates were derived by the rigorous jackknife cross-validation on a stringent benchmark dataset, in which none of the included proteins had ≥40% pairwise sequence identity to any other protein in a same subset. As a user-friendly web-server, GPCR-2L is freely accessible to the public at http://icpr.jci.edu.cn/, by which one can obtain the 2-level results in about 20 s for a query protein sequence of 500 amino acids. The longer the sequence is, the more time it may usually need. The high success rates reported here indicate that it is a quite effective approach to identify GPCRs and their types with the functional domain information and the low-frequency Fourier spectrum analysis. It is anticipated that GPCR-2L may become a useful tool for both basic research and drug development in the areas related to GPCRs.  相似文献   

3.
As one of the most important posttranslational modifications (PTMs), ubiquitination plays an important role in regulating varieties of biological processes, such as signal transduction, cell division, apoptosis, and immune response. Ubiquitination is also named “lysine ubiquitination” because it occurs when an ubiquitin is covalently attached to lysine (K) residues of targeting proteins. Given an uncharacterized protein sequence that contains many lysine residues, which one of them is the ubiquitination site, and which one is of non-ubiquitination site? With the avalanche of protein sequences generated in the postgenomic age, it is highly desired for both basic research and drug development to develop an automated method for rapidly and accurately annotating the ubiquitination sites in proteins. In view of this, a new predictor called “iUbiq-Lys” was developed based on the evolutionary information, gray system model, as well as the general form of pseudo-amino acid composition. It was demonstrated via the rigorous cross-validations that the new predictor remarkably outperformed all its counterparts. As a web-server, iUbiq-Lys is accessible to the public at http://www.jci-bioinfo.cn/iUbiq-Lys. For the convenience of most experimental scientists, we have further provided a protocol of step-by-step guide, by which users can easily get their desired results without the need to follow the complicated mathematics that were presented in this paper just for the integrity of its development process.  相似文献   

4.
Protein secretion plays an important role in bacterial lifestyles. Secreted proteins are crucial for bacterial pathogenesis by making bacteria interact with their environments, particularly delivering pathogenic and symbiotic bacteria into their eukaryotic hosts. Therefore, identification of bacterial secreted proteins becomes an important process for the study of various diseases and the corresponding drugs. In this paper, fusing several new features into Chou’s pseudo-amino acid composition (PseAAC), two support vector machine (SVM)-based ternary classifiers are developed to predict secreted proteins of Gram-negative and Gram-positive bacteria. For the two types of bacteria, the high accuracy of 94.03% and 94.36% are obtained in distinguishing classically secreted, non-classically secreted and non-secreted proteins by our method. In order to compare the practical ability of our method in identifying bacterial secreted proteins with those of six published methods, proteins in Escherichia coli and Bacillus subtilis are collected to construct the test sets of Gram-negative and Gram-positive bacteria, and the prediction results of our method are comparable to those of existing methods. When performed on two public independent data sets for predicting NCSPs, it also yields satisfactory results for Gram-negative bacterial proteins. The prediction server SecretP can be accessed at http://cic.scu.edu.cn/bioinformatics/secretPV2/index.htm.  相似文献   

5.
6.
We present a software system BASIO that allows one to segment a sequence into regions with homogeneous nucleotide composition at a desired length scale. The system can work with arbitrary alphabet and therefore can be applied to various (e.g. protein) sequences. Several sequences of complete genomes of eukaryotes are used to demonstrate the efficiency of the software. AVAILABILITY: The BASIO suite is available for non-commercial users free of charge as a set of executables and accompanying segmentation scenarios from http://www.imb.ac.ru/compbio/basio. To obtain the source code, contact the authors.  相似文献   

7.
Pattern recognition receptors (PRRs) play a key role in the innate immune response by recognizing pathogen associated molecular patterns derived from a diverse collection of microbial pathogens. PRRs form a superfamily of proteins related to host health and disease. Thus, prediction of PRR family might supply biologically significant information for functional annotation of PRRs and development of novel drugs. In this paper, a computational method is proposed for predicting the families of PRRs. The prediction was performed on the basis of amino acid composition and pseudo-amino acid composition (PseAAC) from primary sequences of proteins using support vector machines. A non-redundant dataset consisted of 332 PRRs in seven families was constructed to do training and testing. It was demonstrated that different families of PRRs were quite closely correlated with amino acid composition as well as PseAAC. In the jackknife test, overall accuracies of amino acid composition-based and PseAAC-based classifiers reached 96.1% and 97.9%, respectively. The results indicate that families of PRRs are predictable with high accuracy. It is anticipated that this computational method might be a powerful tool for the automated assignment of families of PRRs.  相似文献   

8.
Knowledge of membrane protein type often provides crucial hints toward determining the function of an uncharacterized membrane protein. With the avalanche of new protein sequences emerging during the post-genomic era, it is highly desirable to develop an automated method that can serve as a high throughput tool in identifying the types of newly found membrane proteins according to their primary sequences, so as to timely make the relevant annotations on them for the reference usage in both basic research and drug discovery. Based on the concept of pseudo-amino acid composition [K.C. Chou, Proteins: Struct. Funct. Genet. 43 (2001) 246-255; Erratum: Proteins: Struct. Funct. Genet. 44 (2001) 60] that has made it possible to incorporate a considerable amount of sequence-order effects by representing a protein sample in terms of a set of discrete numbers, a novel predictor, the so-called "optimized evidence-theoretic K-nearest neighbor" or "OET-KNN" classifier, was proposed. It was demonstrated via the self-consistency test, jackknife test, and independent dataset test that the new predictor, compared with many previous ones, yielded higher success rates in most cases. The new predictor can also be used to improve the prediction quality for, among many other protein attributes, structural class, subcellular localization, enzyme family class, and G-protein coupled receptor type. The OET-KNN classifier will be available as a web-server at http://www.pami.sjtu.edu.cn/kcchou.  相似文献   

9.
Introduction: The study of microbial communities based on the combined analysis of genomic and proteomic data – called metaproteogenomics – has gained increased research attention in recent years. This relatively young field aims to elucidate the functional and taxonomic interplay of proteins in microbiomes and its implications on human health and the environment.

Areas covered: This article reviews bioinformatics methods and software tools dedicated to the analysis of data from metaproteomics and metaproteogenomics experiments. In particular, it focuses on the creation of tailored protein sequence databases, on the optimal use of database search algorithms including methods of error rate estimation, and finally on taxonomic and functional annotation of peptide and protein identifications.

Expert opinion: Recently, various promising strategies and software tools have been proposed for handling typical data analysis issues in metaproteomics. However, severe challenges remain that are highlighted and discussed in this article; these include: (i) robust false-positive assessment of peptide and protein identifications, (ii) complex protein inference against a background of highly redundant data, (iii) taxonomic and functional post-processing of identification data, and finally, (iv) the assessment and provision of metrics and tools for quantitative analysis.  相似文献   


10.
Ensemble classifier for protein fold pattern recognition   总被引:4,自引:0,他引:4  
MOTIVATION: Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each trained in different parameter systems, such as predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, as well as different dimensions of pseudo-amino acid composition, which were extracted from a training dataset. The operation engine for the constituent individual classifiers was OET-KNN (optimized evidence-theoretic k-nearest neighbors) rule. Their outcomes were combined through a weighted voting to give a final determination for classifying a query protein. The recognition was to find the true fold among the 27 possible patterns. RESULTS: The overall success rate thus obtained was 62% for a testing dataset where most of the proteins have <25% sequence identity with the proteins used in training the classifier. Such a rate is 6-21% higher than the corresponding rates obtained by various existing NN (neural networks) and SVM (support vector machines) approaches, implying that the ensemble classifier is very promising and might become a useful vehicle in protein science, as well as proteomics and bioinformatics. AVAILABILITY: The ensemble classifier, called PFP-Pred, is available as a web-server at http://202.120.37.186/bioinf/fold/PFP-Pred.htm for public usage.  相似文献   

11.
The malaria disease has become a cause of poverty and a major hindrance to economic development. The culprit of the disease is the parasite, which secretes an array of proteins within the host erythrocyte to facilitate its own survival. Accordingly, the secretory proteins of malaria parasite have become a logical target for drug design against malaria. Unfortunately, with the increasing resistance to the drugs thus developed, the situation has become more complicated. To cope with the drug resistance problem, one strategy is to timely identify the secreted proteins by malaria parasite, which can serve as potential drug targets. However, it is both expensive and time-consuming to identify the secretory proteins of malaria parasite by experiments alone. To expedite the process for developing effective drugs against malaria, a computational predictor called “iSMP-Grey” was developed that can be used to identify the secretory proteins of malaria parasite based on the protein sequence information alone. During the prediction process a protein sample was formulated with a 60D (dimensional) feature vector formed by incorporating the sequence evolution information into the general form of PseAAC (pseudo amino acid composition) via a grey system model, which is particularly useful for solving complicated problems that are lack of sufficient information or need to process uncertain information. It was observed by the jackknife test that iSMP-Grey achieved an overall success rate of 94.8%, remarkably higher than those by the existing predictors in this area. As a user-friendly web-server, iSMP-Grey is freely accessible to the public at http://www.jci-bioinfo.cn/iSMP-Grey. Moreover, for the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematical equations involved in this paper.  相似文献   

12.
MOTIVATION: Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. RESULTS: In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions. The total prediction accuracies reach 91.4% for three subcellular locations in prokaryotic organisms and 79.4% for four locations in eukaryotic organisms. Predictions by our approach are robust to errors in the protein N-terminal sequences. This new approach provides superior prediction performance compared with existing algorithms based on amino acid composition and can be a complementary method to other existing methods based on sorting signals. AVAILABILITY: A web server implementing the prediction method is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/. SUPPLEMENTARY INFORMATION: Supplementary material is available at http://www.bioinfo.tsinghua.edu.cn/SubLoc/.  相似文献   

13.
Li ZC  Zhou XB  Dai Z  Zou XY 《Amino acids》2009,37(2):415-425
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou’s pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246–255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.  相似文献   

14.
Introduction: Structural characterization of low molecular weight heparin (LMWH) is critical to meet biosimilarity standards. In this context, the review focuses on structural analysis of labile sulfates attached to the side-groups of LMWH using mass spectrometry. A comprehensive review of this topic will help readers to identify key strategies for tackling the problem related to sulfate loss. At the same time, various mass spectrometry techniques are presented to facilitate compositional analysis of LMWH, mainly enoxaparin.

Areas covered: This review summarizes findings on mass spectrometry application for LMWH, including modulation of sulfates, using enzymology and sample preparation approaches. Furthermore, popular open-source software packages for automated spectral data interpretation are also discussed. Successful use of LC/MS can decipher structural composition for LMWH and help evaluate their sameness or biosimilarity with the innovator molecule. Overall, the literature has been searched using PubMed by typing various search queries such as ‘enoxaparin’, ‘mass spectrometry’, ‘low molecular weight heparin’, ‘structural characterization’, etc.

Expert commentary: This section highlights clinically relevant areas that need improvement to achieve satisfactory commercialization of LMWHs. It also primarily emphasizes the advancements in instrumentation related to mass spectrometry, and discusses building automated software for data interpretation and analysis.  相似文献   


15.
16.
Context: Drugs such as positive allosteric modulators (PAMs) produce complex behaviors when acting on tissues in different physiological contexts in vivo.

Objective: This study describes the use of functional assays of varying receptor sensitivity to unveil the various behaviors of PAMs and thus quantify allosteric effect through system independent scales.

Materials and methods: Muscarinic receptor activation with acetylcholine (ACh) was used to the demonstrate activity of the PAM agonist 1–(4-methoxybenzyl)-4-oxo-1,4-dihydroquinoline-3-carboxylic acid, Benzyl quinolone carboxylic acid (BQCA) in terms of direct agonism, potentiation of ACh affinity, and ACh efficacy. Concentration–response curves were fit to the functional allosteric model to yield indices of agonism (τB), effects on affinity (α cooperativity), and efficacy (β cooperativity).

Results: It is shown that a highly sensitive functional assay revealed the direct efficacy of BQCA as an agonist and relatively insensitive cells (produced by chemical alkylation of muscarinic receptor with phenoxybenzamine) revealed a positive allosteric effect of BQCA on ACh efficacy. A wide range of functional assay sensitivities produced a complex pattern of behavior for BQCA all of which was accurately quantified through the system-independent parameters of the functional allosteric model.

Conclusions: The study of complex allosteric molecules in a range of functional assays of varying sensitivity allows the measurement of the complete array of activities of these molecules on receptors and also better predicts which will be seen with these in vivo where a range of tissue sensitivities is encountered.  相似文献   


17.
Called by many as biology’s version of Swiss army knives, proteases cut long sequences of amino acids into fragments and regulate most physiological processes. They are vitally important in the life cycle. Different types of proteases have different action mechanisms and biological processes. With the avalanche of protein sequences generated during the postgenomic age, it is highly desirable for both basic research and drug design to develop a fast and reliable method for identifying the types of proteases according to their sequences or even just for whether they are proteases or not. In this article, three recently developed identification methods in this regard are discussed: (i) FunD-PseAAC, (ii) GO-PseAAC, and (iii) FunD-PsePSSM. The first two were established by hybridizing the FunD (functional domain) approach and the GO (gene ontology) approach, respectively, with the PseAAC (pseudo amino acid composition) approach. The third method was established by fusing the FunD approach with the PsePSSM (pseudo position-specific scoring matrix) approach. Of these three methods, only FunD-PsePSSM has provided a server called ProtIdent (protease identifier), which is freely accessible to the public via the website at http://www.csbio.sjtu.edu.cn/bioinf/Protease. For the convenience of users, a step-by-step guide on how to use ProtIdent is illustrated. Meanwhile, the caveat in using ProtIdent and how to understand the success expectancy rate of a statistical predictor are discussed. Finally, the essence of why ProtIdent can yield a high success rate in identifying proteases and their types is elucidated.  相似文献   

18.
Introduction: Human serum albumin (HSA) is a multifaceted protein with vital physiological functions. It is the most abundant plasma protein with inherent capability to bind to diverse ligands, and thus susceptible to various post-translational modifications (PTMs) which alter its structure and functions. One such PTM is glycation, a non-enzymatic reaction between reducing sugar and protein leading to formation of heterogeneous advanced glycation end products (AGEs). Glycated albumin (GA) concentration increases significantly in diabetes and is implicated in development of secondary complications.

Areas covered: In this review, we discuss in depth, formation of GA and its consequences, approaches used for characterization and quantification of GA, milestones in GA proteomics, clinical relevance of GA as a biomarker, significance of maintaining abundant levels of albumin and future perspectives.

Expert commentary: Elevated GA levels are associated with development of insulin resistance as well as secondary complications, in healthy and diabetic individuals respectively. Mass spectrometry (MS) based approaches aid in precise characterization and quantification of GA including early and advanced glycated peptides, which can be useful in prediction of the disease status. Thus GA has evolved to be one of the best candidates in the pursuit of diagnostic markers for prediction of prediabetes and diabetic complications.  相似文献   


19.
Objectives: To evaluate the association between nutritional status, resting energy expenditure (REE), and protein oxidative stress in patients after kidney transplantation (KT).

Methodology: The study evaluated 35 patients transplanted at the time of hospital discharge and 3 months after regarding: body composition, REE (by indirect calorimetry), and injury factor (IF); serum urea, creatinine, glucose, albumin, total protein, advanced oxidation protein products (AOPP), vitamin C.

Results: Three months after discharge, there was an improvement in renal function, nutritional status, and oxidative stress, with a standardization in the REE/kg. There was an increase in body weight, mainly in fat mass. The correlations showed that a greater cold ischemia time resulted in a deeper decline in vitamin C; a longer hospital length stay resulted in a greater reduction in AOPP; the higher preoperative body weight showed greater increases in body fat and glucose after transplantation. For decreases in REE and IF, there were increases in total protein. Finally, at hospital discharge there was a greater gain in weight, lower albumin, and total protein among individuals who had rejection episodes.

Discussion: The KT improves many of metabolic abnormalities, with the improvement of nutritional status, oxidative stress, and normalization of REE.  相似文献   


20.
Introduction: Amniotic fluid (AF) is a dynamic and complex mixture that reflects the physiological condition of developing fetus. In the last decade, proteomic analysis of AF for 16–18 weeks normal pregnancy has been done for the composition and functions of this fluid. Other body fluids such as urine, sweat, tears, etc. are being used for diagnosis of disease, but an insight into protein biomarkers of amniotic fluid can save the fetus and mother from future complications.

Areas covered: We have covered the proteomics of amniotic fluid done since 2000, in order to strengthen the establishment of these techniques as a recognized diagnostic tool in the field. After classifying the diseases based on chromosomal aneuploidies, gestational changes, and inflammation caused during pregnancy; we have focused on amniotic fluid to detect various complications during and post pregnancy and its effect on the fetomaternal relationship.

Expert comment: The main protein biomarkers responsible for various syndromes, diseases, and complications have been summarized. Major proteins identified for gestational conditions are IGFBP-1, fibrinogen, neutrophil defensins like calgranulins A and C, cathelicidin, APOA1, TRFE, etc. Validation of particular technique and establishing a single standardized biomarker for the diagnosis to avoid any overlapping for different diseases is required. After certain improvements, proteomics approach can be considered for diagnosis of diseases associated with fetal-maternal health.  相似文献   


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号