首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
MOTIVATION: We focus on the prediction of disulfide bridges in proteins starting from their amino acid sequence and from the knowledge of the disulfide bonding state of each cysteine. The location of disulfide bridges is a structural feature that conveys important information about the protein main chain conformation and can therefore help towards the solution of the folding problem. Existing approaches based on weighted graph matching algorithms do not take advantage of evolutionary information. Recursive neural networks (RNN), on the other hand, can handle in a natural way complex data structures such as graphs whose vertices are labeled by real vectors, allowing us to incorporate multiple alignment profiles in the graphical representation of disulfide connectivity patterns. RESULTS: The core of the method is the use of machine learning tools to rank alternative disulfide connectivity patterns. We develop an ad-hoc RNN architecture for scoring labeled undirected graphs that represent connectivity patterns. In order to compare our algorithm with previous methods, we report experimental results on the SWISS-PROT 39 dataset. We find that using multiple alignment profiles allows us to obtain significant prediction accuracy improvements, clearly demonstrating the important role played by evolutionary information. AVAILABILITY: The Web interface of the predictor is available at http://neural.dsi.unifi.it/cysteines  相似文献   

3.
MOTIVATION: Several kernel-based methods have been recently introduced for the classification of small molecules. Most available kernels on molecules are based on 2D representations obtained from chemical structures, but far less work has focused so far on the definition of effective kernels that can also exploit 3D information. RESULTS: We introduce new ideas for building kernels on small molecules that can effectively use and combine 2D and 3D information. We tested these kernels in conjunction with support vector machines for binary classification on the 60 NCI cancer screening datasets as well as on the NCI HIV data set. Our results show that 3D information leveraged by these kernels can consistently improve prediction accuracy in all datasets. AVAILABILITY: An implementation of the small molecule classifier is available from http://www.dsi.unifi.it/neural/src/3DDK.  相似文献   

4.
Lise S  Buchan D  Pontil M  Jones DT 《PloS one》2011,6(2):e16774
Protein-protein interactions are critically dependent on just a few 'hot spot' residues at the interface. Hot spots make a dominant contribution to the free energy of binding and they can disrupt the interaction if mutated to alanine. Here, we present HSPred, a support vector machine(SVM)-based method to predict hot spot residues, given the structure of a complex. HSPred represents an improvement over a previously described approach (Lise et al, BMC Bioinformatics 2009, 10:365). It achieves higher accuracy by treating separately predictions involving either an arginine or a glutamic acid residue. These are the amino acid types on which the original model did not perform well. We have therefore developed two additional SVM classifiers, specifically optimised for these cases. HSPred reaches an overall precision and recall respectively of 61% and 69%, which roughly corresponds to a 10% improvement. An implementation of the described method is available as a web server at http://bioinf.cs.ucl.ac.uk/hspred. It is free to non-commercial users.  相似文献   

5.
The R package mosclust (model order selection for clustering problems) implements algorithms based on the concept of stability for discovering significant structures in bio-molecular data. The software library provides stability indices obtained through different data perturbations methods (resampling, random projections, noise injection), as well as statistical tests to assess the significance of multi-level structures singled out from the data. Availability: http://homes.dsi.unimi.it/~valenti/SW/mosclust/download/mosclust_1.0.tar.gz. Supplementary information: http://homes.dsi.unimi.it/~valenti/SW/mosclust.  相似文献   

6.
7.
Native interleukin-2 (IL-2) contains three cysteines; two exist in a disulfide bridge (Cys-58 and Cys-105) and the third Cys-125 is a free sulfhydryl. In the presence of 6 M guanidine hydrochloride at alkaline pH, IL-2 is converted into three isomers. Each isomer represents one of the three possible disulfide-linked forms that can be generated from three cysteines. These three isomers were resolved on a C4 reverse-phase HPLC system. The identity of each of the three forms was determined by carboxymethylation of the free cysteines in each isomer with [3H]iodoacetic acid followed by determination of the labelled cysteines by tryptic peptide mapping. Tryptic peptide mapping of the more predominant of the two scrambled peaks showed it to be the Cys-105-S-S-Cys-125 linked form of IL-2. A Ser-125 construction of IL-2, which lacks a free cysteine, did not scramble under these conditions. These experiments demonstrate the utility of reverse-phase HPLC in studies of protein folding and disulfide bond structure.  相似文献   

8.
SUMMARY: Disulfide by Design is a program for the design of novel disulfide bonds in proteins. Protein structure files in PDB format are analyzed to identify residue pairs that are likely to form a disulfide bond if the respective amino acids are mutated to cysteines. The output displays residue pairs having the appropriate geometric characteristics for disulfide formation and provides automated generation of modified PDB files including modeled disulfides. Validation demonstrates a high level of accuracy for the algorithm. AVAILABILITY: http://www.ehscenter.org/dbd/ Supplementary information: http://www.ehscenter.org/dbd/  相似文献   

9.
10.
The locations of disulfide bonds and free cysteines in the heavy and light chains of recombinant human factor VIII were determined by sequence analysis of fragments produced by chemical and enzymatic digestions. The A1 and A2 domains of the heavy chain and the A3 domain of the light chain contain one free cysteine and two disulfide bonds, whereas the C1 and C2 domains of the light chain have one disulfide bond and no free cysteine. The positions of these disulfide bonds are conserved in factor V and ceruloplasmin except that the second disulfide bond in the A3 domain is missing in both factor V and ceruloplasmin. The positions of the three free cysteines of factor VIII are the same as three of the four cysteines present in ceruloplasmin. However, the positions of the free cysteines in factor VIII and ceruloplasmin are not conserved in factor V.  相似文献   

11.
Nuclear localization signals (NLSs) are stretches of residues in proteins mediating their importing into the nucleus. NLSs are known to have diverse patterns, of which only a limited number are covered by currently known NLS motifs. Here we propose a sequential pattern mining algorithm SeqNLS to effectively identify potential NLS patterns without being constrained by the limitation of current knowledge of NLSs. The extracted frequent sequential patterns are used to predict NLS candidates which are then filtered by a linear motif-scoring scheme based on predicted sequence disorder and by the relatively local conservation (IRLC) based masking.The experiment results on the newly curated Yeast and Hybrid datasets show that SeqNLS is effective in detecting potential NLSs. The performance comparison between SeqNLS with and without the linear motif scoring shows that linear motif features are highly complementary to sequence features in discerning NLSs. For the two independent datasets, our SeqNLS not only can consistently find over 50% of NLSs with prediction precision of at least 0.7, but also outperforms other state-of-the-art NLS prediction methods in terms of F1 score or prediction precision with similar or higher recall rates. The web server of the SeqNLS algorithm is available at http://mleg.cse.sc.edu/seqNLS.  相似文献   

12.
Site-directed mutagenesis has been used to insert cysteine residues at specific locations in the myosin light chain 2 (LC2) sequence. The aim was to modify these cysteines with one or more spectroscopic probes and to reconstitute myosin with labeled light chains for structural studies. Native LC2 has two endogenous cysteine residues at positions 126 and 155; a third sulfhydryl was added by replacing either Pro2, Ser73, or Pro94 with cysteine. By oxidizing the endogenous cysteines to an intramolecular disulfide bond (Katoh, T., and Lowey, S., (1989) J. Cell Biol. 109, 1549), it was expected that the new cysteine could be selectively labeled with a fluorescent probe. This proved more difficult to accomplish than anticipated due to the formation of secondary disulfide bonds between the newly engineered cysteines and the native ones. Nevertheless, the unpaired cysteines were labeled with 5-(iodoacetamido)fluorescein, and singly labeled species were purified by ion-exchange chromatography. Chymotryptic digestion of the light chains, followed by high performance liquid chromatography separation of the peptides, led to the identification of the fluorescein-labeled cysteines. After light chain exchange into myosin, the position of the thiols was mapped by antifluorescyl antibodies in the electron microscope. Rotary-shadowed images showed the antibody bound at the head/rod junction of myosin for all the mutants. These mapping studies, together with the finding that widely separated cysteines can form multiple disulfide bonds, support a model for LC2 as a flexible, globular molecule that resembles other Ca/Mg-binding proteins in tertiary structure.  相似文献   

13.
Error tolerant backbone resonance assignment is the cornerstone of the NMR structure determination process. Although a variety of assignment approaches have been developed, none works sufficiently well on noisy fully automatically picked peaks to enable the subsequent automatic structure determination steps. We have designed an integer linear programming (ILP) based assignment system (IPASS) that has enabled fully automatic protein structure determination for four test proteins. IPASS employs probabilistic spin system typing based on chemical shifts and secondary structure predictions. Furthermore, IPASS extracts connectivity information from the inter-residue information and the (automatically picked) (15)N-edited NOESY peaks which are then used to fix reliable fragments. When applied to automatically picked peaks for real proteins, IPASS achieves an average precision and recall of 82% and 63%, respectively. In contrast, the next best method, MARS, achieves an average precision and recall of 77% and 36%, respectively. The assignments generated by IPASS are then fed into our protein structure calculation system, FALCON-NMR, to determine the 3D structures without human intervention. The final models have backbone RMSDs of 1.25?, 0.88?, 1.49?, and 0.67? to the reference native structures for proteins TM1112, CASKIN, VRAR, and HACS1, respectively. The web server is publicly available at http://monod.uwaterloo.ca/nmr/ipass.  相似文献   

14.
Chang JM  Su EC  Lo A  Chiu HS  Sung TY  Hsu WL 《Proteins》2008,72(2):693-710
Prediction of protein subcellular localization (PSL) is important for genome annotation, protein function prediction, and drug discovery. Many computational approaches for PSL prediction based on protein sequences have been proposed in recent years for Gram-negative bacteria. We present PSLDoc, a method based on gapped-dipeptides and probabilistic latent semantic analysis (PLSA) to solve this problem. A protein is considered as a term string composed by gapped-dipeptides, which are defined as any two residues separated by one or more positions. The weighting scheme of gapped-dipeptides is calculated according to a position specific score matrix, which includes sequence evolutionary information. Then, PLSA is applied for feature reduction, and reduced vectors are input to five one-versus-rest support vector machine classifiers. The localization site with the highest probability is assigned as the final prediction. It has been reported that there is a strong correlation between sequence homology and subcellular localization (Nair and Rost, Protein Sci 2002;11:2836-2847; Yu et al., Proteins 2006;64:643-651). To properly evaluate the performance of PSLDoc, a target protein can be classified into low- or high-homology data sets. PSLDoc's overall accuracy of low- and high-homology data sets reaches 86.84% and 98.21%, respectively, and it compares favorably with that of CELLO II (Yu et al., Proteins 2006;64:643-651). In addition, we set a confidence threshold to achieve a high precision at specified levels of recall rates. When the confidence threshold is set at 0.7, PSLDoc achieves 97.89% in precision which is considerably better than that of PSORTb v.2.0 (Gardy et al., Bioinformatics 2005;21:617-623). Our approach demonstrates that the specific feature representation for proteins can be successfully applied to the prediction of protein subcellular localization and improves prediction accuracy. Besides, because of the generality of the representation, our method can be extended to eukaryotic proteomes in the future. The web server of PSLDoc is publicly available at http://bio-cluster.iis.sinica.edu.tw/~ bioapp/PSLDoc/.  相似文献   

15.
Plant purple acid phosphatases - genes, structures and biological function   总被引:3,自引:0,他引:3  
The properties of plant purple acid phosphatases (PAPs), metallophosphoesterases present in some bacteria, plants and animals are reviewed. All members of this group contain a characteristic set of seven amino-acid residues involved in metal ligation. Animal PAPs contain a binuclear metallic center composed of two irons, whereas in plant PAPs one iron ion is joined by zinc or manganese ion. Among plant PAPs two groups can be distinguished: small PAPs, monomeric proteins with molecular mass around 35 kDa, structurally close to mammalian PAPs, and large PAPs, homodimeric proteins with a single polypeptide of about 55 kDa. Large plant PAPs exhibit two types of structural organization. One type comprises enzymes with subunits bound by a disulfide bridge formed by cysteines located in the C-terminal region around position 350. In the second type no cysteines are located in this position and no disulfide bridges are formed between subunits. Differences in structural organisation are reflected in substrate preferences. Recent data reveal in plants the occurrence of metallophosphoesterases structurally different from small or large PAPs but with metal-ligating sequences characteristic for PAPs and expressing pronounced specificity towards phytate or diphosphate nucleosides and inorganic pyrophosphate.  相似文献   

16.
ToolShop: prerelease inspections for protein structure prediction servers.   总被引:2,自引:0,他引:2  
The ToolShop server offers a possibility to compare a protein tertiary structure prediction server with other popular servers before releasing it to the public. The comparison is conducted on a set of 203 proteins and the collected models are compared with over 20 other programs using various assessment procedures. The evaluation lasts circa one week. AVAILABILITY: The ToolShop server is available at http://BioInfo.PL/ToolShop/. The administrator should be contacted to couple the tested server to the evaluation suite. CONTACT: leszek@bioinfo.pl SUPPLEMENTARY INFORMATION: The evaluation procedures are similar to those implemented in the continuous online server evaluation program, LiveBench. Additional information is available from its homepage (http://BioInfo.PL/LiveBench/).  相似文献   

17.
N6-methyladenine (6mA) is one kind of post-replication modification (PTM or PTRM) occurring in a wide range of DNA sequences. Accurate identification of its sites will be very helpful for revealing the biological functions of 6mA, but it is time-consuming and expensive to determine them by experiments alone. Unfortunately, so far, no bioinformatics tool is available to do so. To fill in such an empty area, we have proposed a novel predictor called iDNA6mA-PseKNC that is established by incorporating nucleotide physicochemical properties into Pseudo K-tuple Nucleotide Composition (PseKNC). It has been observed via rigorous cross-validations that the predictor's sensitivity (Sn), specificity (Sp), accuracy (Acc), and stability (MCC) are 93%, 100%, 96%, and 0.93, respectively. For the convenience of most experimental scientists, a user-friendly web server for iDNA6mA-PseKNC has been established at http://lin-group.cn/server/iDNA6mA-PseKNC, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.  相似文献   

18.
The disulfide bonds in the galactose-specific lectin SEL 24K from the egg of the Chinook salmon Oncorhynchus tshawytscha were determined by mass spectrometry. Four predictive in silico tools were used to determine the oxidation state of cysteines in the sequence and possible location of the disulfide bonds. A combination of tryptic digestion, HPLC separation, and chemical modifications were used to establish the location of seven disulfide bonds and one pair of free cysteines. After proteolysis, peptides containing one or two disulfide bonds were identified by reduction and mass spectral comparison. MALDI mass spectrometry was supported by chemical modification (iodoacetamide) and in silico digestion. The assignments of disulfide bonds were further confirmed by mass spectral fragmentation studies including in-source dissociation (ISD) and collision-induced dissociation (CID). The experimentally determined disulfide bonds and free Cys residues were only partially consistent with those generated by several automated public-domain algorithms.  相似文献   

19.
Predicting the oxidation state of cysteines by multiple sequence alignment   总被引:4,自引:0,他引:4  
MOTIVATION: Protein sequences found in databanks usually do not report post translational covalent modifications such as the oxidation state of cystein (Cys) residues. Accurate prediction of whether a functionally or structurally important Cys occurs in the oxidized or thiol form would be helpful for molecular biology experiments and structure prediction. RESULTS: A new method is presented for predicting the oxidation state of Cys residues based on multiple sequence alignments and on the observation that Cys tends to occur in the same oxidation state within the same protein. The prediction of the redox state of Cys performs above 82%. The oxidation state of Cys correlates with the cellular location of the given protein within the cell, but the correlation is not perfect (up to 70%). We also perform a statistical analysis of the different redox states of Cys found in secondary structures and buried positions, and of the secondary structures linked by disulfide bonds. The results suggest that the natural borderline lies between the different oxidation states of Cys rather than between the half cystines and cysteins. AVAILABILITY: A web server implementing the prediction method is available at http://guitar.rockefeller.edu/approximately andras/cyspred.html CONTACT: fisera@rockefeller.edu  相似文献   

20.
RelEx--relation extraction using dependency parse trees   总被引:4,自引:0,他引:4  
MOTIVATION: The discovery of regulatory pathways, signal cascades, metabolic processes or disease models requires knowledge on individual relations like e.g. physical or regulatory interactions between genes and proteins. Most interactions mentioned in the free text of biomedical publications are not yet contained in structured databases. RESULTS: We developed RelEx, an approach for relation extraction from free text. It is based on natural language preprocessing producing dependency parse trees and applying a small number of simple rules to these trees. We applied RelEx on a comprehensive set of one million MEDLINE abstracts dealing with gene and protein relations and extracted approximately 150,000 relations with an estimated performance of both 80% precision and 80% recall. AVAILABILITY: The used natural language preprocessing tools are free for use for academic research. Test sets and relation term lists are available from our website (http://www.bio.ifi.lmu.de/publications/RelEx/).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号