首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Hu LL  Niu S  Huang T  Wang K  Shi XH  Cai YD 《PloS one》2010,5(12):e15917

Background

Hydroxylation is an important post-translational modification and closely related to various diseases. Besides the biotechnology experiments, in silico prediction methods are alternative ways to identify the potential hydroxylation sites.

Methodology/Principal Findings

In this study, we developed a novel sequence-based method for identifying the two main types of hydroxylation sites – hydroxyproline and hydroxylysine. First, feature selection was made on three kinds of features consisting of amino acid indices (AAindex) which includes various physicochemical properties and biochemical properties of amino acids, Position-Specific Scoring Matrices (PSSM) which represent evolution information of amino acids and structural disorder of amino acids in the sliding window with length of 13 amino acids, then the prediction model were built using incremental feature selection method. As a result, the prediction accuracies are 76.0% and 82.1%, evaluated by jackknife cross-validation on the hydroxyproline dataset and hydroxylysine dataset, respectively. Feature analysis suggested that physicochemical properties and biochemical properties and evolution information of amino acids contribute much to the identification of the protein hydroxylation sites, while structural disorder had little relation to protein hydroxylation. It was also found that the amino acid adjacent to the hydroxylation site tends to exert more influence than other sites on hydroxylation determination.

Conclusions/Significance

These findings may provide useful insights for exploiting the mechanisms of hydroxylation.  相似文献   

2.
The synthesis of procollagen hydroxyproline and hydroxylysine was examined in matrix-free cells which were isolated from embryonic tendon by controlled enzymic digestion and then incubated in suspension. After the cells were labeled with [14C]proline for 2 min, or about one-third the synthesis time for a Pro-α chain, [14C]hydroxyproline was found in short peptides considerably smaller than the Pro-α chains of procollagen. The results, therefore, confirmed previous reports indicating that the hydroxylation of proline can begin on nascent chains. In similar experiments in which the cells were labeled with [14C]lysine, [14C]hydroxylysine was found in short, newly synthesized peptides, providing the first evidence that the hydroxylation of lysine can also begin on nascent peptides. However, further experiments demonstrated that the synthesis of hydroxyproline and hydroxylysine continues until some time after assembly of the polypeptide chains is completed.  相似文献   

3.
The pattern of collagen cross-linking is tissue specific primarily determined by the extent of hydroxylation and oxidation of specific lysine residues in the molecule. In this study, murine pre-myoblast cell line, C2C12 cells, were transdifferentiated into osteoblastic cells by bone morphogenetic protein (BMP)-2 treatment, and the gene expression of lysyl hydroxylases (LH1, 2a/b, and 3) and lysyl oxidase (LOX)/lysyl oxidase-like proteins (LOXL1-4), and the extent of hydroxylysine were analyzed. After 24 h of treatment, the expression of most isoforms were upregulated up to 96 h whereas LH2a and LOXL2 decreased with time. In the treated cells, both hydroxyproline and hydroxylysine were detected at day 7 and increased at day 14. The ratio of hydroxylysine to hydroxyproline was significantly increased at day 14. The results indicate that LHs and LOX/LOXLs are differentially responsive to BMP-induced osteoblast differentiation that may eventually lead to the specific collagen cross-linking pattern seen in bone.  相似文献   

4.
Binding sites in proteins can be either specifically functional binding sites (active sites) that bind specific substrates with high affinity or regulatory binding sites (allosteric sites), that modulate the activity of functional binding sites through effector molecules. Owing to their significance in determining protein function, the identification of protein functional and regulatory binding sites is widely acknowledged as an important biological problem. In this work, we present a novel binding site prediction method, Active and Regulatory site Prediction (AR-Pred), which supplements protein geometry, evolutionary, and physicochemical features with information about protein dynamics to predict putative active and allosteric site residues. As the intrinsic dynamics of globular proteins plays an essential role in controlling binding events, we find it to be an important feature for the identification of protein binding sites. We train and validate our predictive models on multiple balanced training and validation sets with random forest machine learning and obtain an ensemble of discrete models for each prediction type. Our models for active site prediction yield a median area under the curve (AUC) of 91% and Matthews correlation coefficient (MCC) of 0.68, whereas the less well-defined allosteric sites are predicted at a lower level with a median AUC of 80% and MCC of 0.48. When tested on an independent set of proteins, our models for active site prediction show comparable performance to two existing methods and gains compared to two others, while the allosteric site models show gains when tested against three existing prediction methods. AR-Pred is available as a free downloadable package at https://github.com/sambitmishra0628/AR-PRED_source .  相似文献   

5.
Wang Y  Xue Z  Shen G  Xu J 《Amino acids》2008,35(2):295-302
Protein–RNA interactions play a key role in a number of biological processes such as protein synthesis, mRNA processing, assembly and function of ribosomes and eukaryotic spliceosomes. A reliable identification of RNA-binding sites in RNA-binding proteins is important for functional annotation and site-directed mutagenesis. We developed a novel method for the prediction of protein residues that interact with RNA using support vector machine (SVM) and position-specific scoring matrices (PSSMs). Two cases have been considered in the prediction of protein residues at RNA-binding surfaces. One is given the sequence information of a protein chain that is known to interact with RNA; the other is given the structural information. Thus, five different inputs have been tested. Coupled with PSI-BLAST profiles and predicted secondary structure, the present approach yields a Matthews correlation coefficient (MCC) of 0.432 by a 7-fold cross-validation, which is the best among all previous reported RNA-binding sites prediction methods. When given the structural information, we have obtained the MCC value of 0.457, with PSSMs, observed secondary structure and solvent accessibility information assigned by DSSP as input. A web server implementing the prediction method is available at the following URL: .  相似文献   

6.
Subunit structure of wheat germ agglutinin   总被引:6,自引:0,他引:6  
Cells isolated by enzymic digestion of embryonic tendon were incubated under N2 so that they synthesized and accumulated the unhydroxylated form of procollagen which is known as protocollagen and which is largely comprised of pro-α chains linked by interchain disulfide bonds. The cells were then exposed to O2 so that the intracellular protocollagen was hydroxylated and secreted as procollagen. When the hydroxylation was allowed to proceed at 31° or 34°, the procollagen secreted into the medium was triple-helical but its hydroxyproline content was less than two-thirds and its hydroxylysine content was less than half the control. Even when the hydroxylation was allowed to occur at 37°, the procollagen secreted by the cells was under-hydroxylated by about 15% in terms of its hydroxyproline content and about 45% in terms of its hydroxylysine content. The results may have consequences for collagen synthesis by tendons and similar tissues in vivo, since temporary anoxia in such tissues may well lead to the synthesis of a less stable procollagen or to fibers of decreased tensile strength.  相似文献   

7.
Nicotinamide adenine dinucleotide (NAD) plays an important role in cellular metabolism and acts as hydrideaccepting and hydride-donating coenzymes in energy production. Identification of NAD protein interacting sites can significantly aid in understanding the NAD dependent metabolism and pathways, and it could further contribute useful information for drug development. In this study, a computational method is proposed to predict NAD-protein interacting sites using the sequence information and structure-based information. All models developed in this work are evaluated using the 7-fold cross validation technique. Results show that using the position specific scoring matrix (PSSM) as an input feature is quite encouraging for predicting NAD interacting sites. After considering the unbalance dataset, the ensemble support vector machine (SVM), which is an assembly of many individual SVM classifiers, is developed to predict the NAD interacting sites. It was observed that the overall accuracy (Acc) thus obtained was 87.31% with Matthew's correlation coefficient (MCC) equal to 0.56. In contrast, the corresponding rate by the single SVM approach was only 80.86% with MCC of 0.38. These results indicated that the prediction accuracy could be remarkably improved via the ensemble SVM classifier approach.  相似文献   

8.
Non-covalent protein-carbohydrate interactions mediate molecular targeting in many biological processes. Prediction of non-covalent carbohydrate binding sites on protein surfaces not only provides insights into the functions of the query proteins; information on key carbohydrate-binding residues could suggest site-directed mutagenesis experiments, design therapeutics targeting carbohydrate-binding proteins, and provide guidance in engineering protein-carbohydrate interactions. In this work, we show that non-covalent carbohydrate binding sites on protein surfaces can be predicted with relatively high accuracy when the query protein structures are known. The prediction capabilities were based on a novel encoding scheme of the three-dimensional probability density maps describing the distributions of 36 non-covalent interacting atom types around protein surfaces. One machine learning model was trained for each of the 30 protein atom types. The machine learning algorithms predicted tentative carbohydrate binding sites on query proteins by recognizing the characteristic interacting atom distribution patterns specific for carbohydrate binding sites from known protein structures. The prediction results for all protein atom types were integrated into surface patches as tentative carbohydrate binding sites based on normalized prediction confidence level. The prediction capabilities of the predictors were benchmarked by a 10-fold cross validation on 497 non-redundant proteins with known carbohydrate binding sites. The predictors were further tested on an independent test set with 108 proteins. The residue-based Matthews correlation coefficient (MCC) for the independent test was 0.45, with prediction precision and sensitivity (or recall) of 0.45 and 0.49 respectively. In addition, 111 unbound carbohydrate-binding protein structures for which the structures were determined in the absence of the carbohydrate ligands were predicted with the trained predictors. The overall prediction MCC was 0.49. Independent tests on anti-carbohydrate antibodies showed that the carbohydrate antigen binding sites were predicted with comparable accuracy. These results demonstrate that the predictors are among the best in carbohydrate binding site predictions to date.  相似文献   

9.
Prolyl hydroxylation is a PTM that plays an important role in the formation of collagen fibrils and in the oxygen‐dependent regulation of hypoxia inducible factor‐α (HIF‐α). While this modification has been well characterized in the context of these proteins, it remains unclear to what extent it occurs in the remaining mammalian proteome. We explored this question using MS to analyze cellular extracts subjected to various fractionation strategies. In one strategy, we employed the von Hippel Lindau tumor suppressor protein, which recognizes prolyl hydroxylated HIF‐α, as a scaffold for generating hydroxyproline capture reagents. We report novel sites of prolyl hydroxylation within five proteins: FK506‐binding protein 10, myosin heavy chain 10, hexokinase 2, pyruvate kinase, and C‐1 Tetrahydrofolate synthase. Furthermore, we show that identification of prolyl hydroxylation presents a significant technical challenge owing to widespread isobaric methionine oxidation, and that manual inspection of spectra of modified peptides in this context is critical for validation.  相似文献   

10.
Collagen is the most abundant protein in the human body and thereby a structural protein of considerable biotechnological interest. The complex maturation process of collagen, including essential post-translational modifications such as prolyl and lysyl hydroxylation, has precluded large-scale production of recombinant collagen featuring the biophysical properties of endogenous collagen. The characterization of new prolyl and lysyl hydroxylase genes encoded by the giant virus mimivirus reveals a method for production of hydroxylated collagen. The coexpression of a human collagen type III construct together with mimivirus prolyl and lysyl hydroxylases in Escherichia coli yielded up to 90 mg of hydroxylated collagen per liter culture. The respective levels of prolyl and lysyl hydroxylation reaching 25 % and 26 % were similar to the hydroxylation levels of native human collagen type III. The distribution of hydroxyproline and hydroxylysine along recombinant collagen was also similar to that of native collagen as determined by mass spectrometric analysis of tryptic peptides. The triple helix signature of recombinant hydroxylated collagen was confirmed by circular dichroism, which also showed that hydroxylation increased the thermal stability of the recombinant collagen construct. Recombinant hydroxylated collagen produced in E. coli supported the growth of human umbilical endothelial cells, underlining the biocompatibility of the recombinant protein as extracellular matrix. The high yield of recombinant protein expression and the extensive level of prolyl and lysyl hydroxylation achieved indicate that recombinant hydroxylated collagen can be produced at large scale for biomaterials engineering in the context of biomedical applications.  相似文献   

11.
Prediction of RNA binding sites in a protein using SVM and PSSM profile   总被引:1,自引:0,他引:1  
Kumar M  Gromiha MM  Raghava GP 《Proteins》2008,71(1):189-194
  相似文献   

12.
Kaur H  Raghava GP 《FEBS letters》2004,564(1-2):47-57
In this study, an attempt has been made to develop a neural network-based method for predicting segments in proteins containing aromatic-backbone NH (Ar-NH) interactions using multiple sequence alignment. We have analyzed 3121 segments seven residues long containing Ar-NH interactions, extracted from 2298 non-redundant protein structures where no two proteins have more than 25% sequence identity. Two consecutive feed-forward neural networks with a single hidden layer have been trained with standard back-propagation as learning algorithm. The performance of the method improves from 0.12 to 0.15 in terms of Matthews correlation coefficient (MCC) value when evolutionary information (multiple alignment obtained from PSI-BLAST) is used as input instead of a single sequence. The performance of the method further improves from MCC 0.15 to 0.20 when secondary structure information predicted by PSIPRED is incorporated in the prediction. The final network yields an overall prediction accuracy of 70.1% and an MCC of 0.20 when tested by five-fold cross-validation. Overall the performance is 15.2% higher than the random prediction. The method consists of two neural networks: (i) a sequence-to-structure network which predicts the aromatic residues involved in Ar-NH interaction from multiple alignment of protein sequences and (ii) a structure-to structure network where the input consists of the output obtained from the first network and predicted secondary structure. Further, the actual position of the donor residue within the 'potential' predicted fragment has been predicted using a separate sequence-to-structure neural network. Based on the present study, a server Ar_NHPred has been developed which predicts Ar-NH interaction in a given amino acid sequence. The web server Ar_NHPred is available at and (mirror site).  相似文献   

13.
Wang  Cui-cui  Fang  Yaping  Xiao  Jiamin  Li  Menglong 《Amino acids》2011,40(1):239-248
RNA–protein interactions play a pivotal role in various biological processes, such as mRNA processing, protein synthesis, assembly, and function of ribosome. In this work, we have introduced a computational method for predicting RNA-binding sites in proteins based on support vector machines by using a variety of features from amino acid sequence information including position-specific scoring matrix (PSSM) profiles, physicochemical properties and predicted solvent accessibility. Considering the influence of the surrounding residues of an amino acid and the dependency effect from the neighboring amino acids, a sliding window and a smoothing window are used to encode the PSSM profiles. The outer fivefold cross-validation method is evaluated on the data set of 77 RNA-binding proteins (RBP77). It achieves an overall accuracy of 88.66% with the Matthew’s correlation coefficient (MCC) of 0.69. Furthermore, an independent data set of 39 RNA-binding proteins (RBP39) is employed to further evaluate the performance and achieves an overall accuracy of 82.36% with the MCC of 0.44. The result shows that our method has good generalization abilities in predicting RNA-binding sites for novel proteins. Compared with other previous methods, our method performs well on the same data set. The prediction results suggest that the used features are effective in predicting RNA-binding sites in proteins. The code and all data sets used in this article are freely available at .  相似文献   

14.
THE hydroxyproline and hydroxylysine in collagen are synthesized by hydroxylation of proline and lysine after these amino-acids have been incorporated into peptide linkages (for review see ref. 1). Experiments with embryonic cartilage in vitro in which the hydroxylases were intermittently inhibited demonstrated that the hydroxylations can occur after the proline-rich and lysine-rich polypeptide precursor protocollagen is released from ribosomal complexes1,2. There has been controversy, however, over the question of whether in uninhibited systems the hydroxylation of the appropriate prolyl and lysyl residues occurs while nascent polypeptide chains are still being assembled on ribosomes1,3,4.  相似文献   

15.
The hydroxylation of lysine and glycosylations of hydroxylysine were studied in isolated chick-embryo tendon and cartilage cells under conditions in which collagen triple-helix formation was either inhibited or accelerated. The former situation was obtained by incubating the tendon cells with 0.6mm-dithiothreitol, thus decreasing their proline hydroxylase activity by about 99%. After labelling with [(14)C]proline, the formation of hydroxy[(14)C]proline was found to have declined by about 95%. Since the hydroxylation of a relatively large number of proline residues is required for triple-helix formation at 37 degrees C, the pro-alpha-chains synthesized under these conditions apparently cannot form triple-helical molecules. Labelling experiments with [(14)C]lysine indicated that the degree of hydroxylation of the lysine residues in the collagen synthesized was slightly increased and the degree of the glycosylations of the hydroxylysine residues more than doubled, the largest increase being in the content of glucosylgalactosylhydroxylysine. Recovery of chick-embryo cartilage cells from temporary anoxia was used to obtain accelerated triple-helix formation. A marked decrease was found in the extent of hydroxylation of the lysine residues in the collagen synthesized under these conditions, and an even larger decrease occurred in the glycosylations of the hydroxylysine residues. The results support the previous suggestion that the triple-helix formation of the pro-alpha-chains prevents further hydroxylation of lysine residues and glycosylations of hydroxylysine residues during collagen biosynthesis.  相似文献   

16.
BackgroundThis study aimed to investigate the prolyl and lysine hydroxylation in elastin from different species and tissues.MethodsEnzymatic digests of elastin samples from human, cattle, pig and chicken were analyzed using mass spectrometry and bioinformatics tools.ResultsIt was confirmed at the protein level that elastin does not contain hydroxylated lysine residues regardless of the species. In contrast, prolyl hydroxylation sites were identified in all elastin samples. Moreover, the analysis of the residues adjacent to prolines allowed the determination of the substrate site preferences of prolyl 4-hydroxylase. It was found that elastins from all analyzed species contain hydroxyproline and that at least 20%–24% of all proline residues were partially hydroxylated. Determination of the hydroxylation degrees of specific proline residues revealed that prolyl hydroxylation depends on both the species and the tissue, however, is independent of age. The fact that the highest hydroxylation degrees of proline residues were found for elastin from the intervertebral disc and knowledge of elastin arrangement in this tissue suggest that hydroxylation plays a biomechanical role. Interestingly, a proline-rich domain of tropoelastin (domain 24), which contains several repeats of bioactive motifs, does not show any hydroxyproline residues in the mammals studied.ConclusionsThe results show that prolyl hydroxylation is not a coincidental feature and may contribute to the adaptation of the properties of elastin to meet the functional requirements of different tissues.General significanceThe study for the first time shows that prolyl hydroxylation is highly regulated in elastin.  相似文献   

17.
It has been previously shown that dermis from subjects with hydroxylysine-deficient collagen contains approximately 5% of normal levels of hydroxylysine and sonicates of skin fibroblasts contain less than 15% of normal levels of collagen lysyl hydroxylase activity. However, cultures of dermal fibroblasts from two siblings with hydroxylysine-deficient collagen (Ehlers-Danlos Syndrome Type VI) compared to fibroblasts from normal subjects synthesize collagen containing approximately 50% of normal amounts of hydroxylysine. The lysyl hydroxylase deficient cultures synthesize both Type I and Type III collagen in the same proportion as control cultures. Both alpha 1(I) and alpha 2 chains are similarly reduced in hydroxylysine content. Collagen prolyl hydroxylation by normal collagen lysyl hydroxylation is the same with or without ascorbate supplementation. In mutant cells the rate of prolyl hydroxylation measured after release of inhibition by alpha, alpha'-dipyridyl is the same as in control cells. The rate of lysyl hydroxylation is reduced in mutant cells but only to approximately 50% of normal.  相似文献   

18.
In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position–specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα‐Cα atoms. First, using a rigorous leave‐one‐protein‐out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state‐of‐the‐art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/ . Proteins 2016; 84:332–348. © 2016 Wiley Periodicals, Inc.  相似文献   

19.
The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51–100 amino acids and Blind B: 30–50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100–150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php.  相似文献   

20.
Background Lysine succinylation is one of the reversible protein post-translational modifications (PTMs), which regulate the structure and function of proteins. It plays a significant role in various cellular physiologies including some diseases of human as well as many other organisms. The accurate identification of succinylation site is essential to understand the various biological functions and drug development.Methods In this study, we developed an improved method to predict lysine succinylation sites mapping on Homo sapiens by the fusion of three encoding schemes such as binary, the composition of k-spaced amino acid pairs (CKSAAP) and amino acid composition (AAC) with the random forest (RF) classifier. The prediction performance of the proposed random forest (RF) based on the fusion model in a comparison of other candidates was investigated by using 20-fold cross-validation (CV) and two independent test datasets were collected from two different sources.Results The CV results showed that the proposed predictor achieves the highest scores of sensitivity (SN) as 0.800, specificity (SP) as 0.902, accuracy (ACC) as 0.919, Mathew correlation coefficient (MCC) as 0.766 and partial AUC (pAUC) as 0.163 at a false-positive rate (FPR) = 0.10 and area under the ROC curve (AUC) as 0.958. It achieved the highest performance scores of SN as 0.811, SP as 0.902, ACC as 0.891, MCC as 0.629 and pAUC as 0.139 and AUC as 0.921 for the independent test protein set-1 and SN as 0.772, SP as 0.901, ACC as 0.836, MCC as 0.677 and pAUC as 0.141 at FPR = 0.10 and AUC as 0.923 for the independent test protein set-2. It also outperformed all the other existing prediction models.Conclusion The prediction performances as discussed in this article recommend that the proposed method might be a useful and encouraging computational resource for lysine succinylation site prediction in the case of human population.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号