首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
Some creatures living in extremely low temperatures can produce some special materials called “antifreeze proteins” (AFPs), which can prevent the cell and body fluids from freezing. AFPs are present in vertebrates, invertebrates, plants, bacteria, fungi, etc. Although AFPs have a common function, they show a high degree of diversity in sequences and structures. Therefore, sequence similarity based search methods often fails to predict AFPs from sequence databases. In this work, we report a random forest approach “AFP-Pred” for the prediction of antifreeze proteins from protein sequence. AFP-Pred was trained on the dataset containing 300 AFPs and 300 non-AFPs and tested on the dataset containing 181 AFPs and 9193 non-AFPs. AFP-Pred achieved 81.33% accuracy from training and 83.38% from testing. The performance of AFP-Pred was compared with BLAST and HMM. High prediction accuracy and successful of prediction of hypothetical proteins suggests that AFP-Pred can be a useful approach to identify antifreeze proteins from sequence information, irrespective of their sequence similarity.  相似文献   

2.
The wealth of protein sequence and structure data is greater than ever, thanks to the ongoing Genomics and Structural Genomics projects. The information available through such efforts needs to be analysed by new methods that combine both databases. One important result of genomic sequence analysis is the inference of functional homology among proteins. Until recently sequence similarity comparison was the only method for homologue inference. The new fold recognition approach reviewed in this paper enhances sequence comparison methods by including structural information in the process of protein comparison. This additional information often allows for the detection of similarities that cannot be found by methods that only use sequence information.  相似文献   

3.

Background  

Odorant binding proteins (OBPs) are believed to shuttle odorants from the environment to the underlying odorant receptors, for which they could potentially serve as odorant presenters. Although several sequence based search methods have been exploited for protein family prediction, less effort has been devoted to the prediction of OBPs from sequence data and this area is more challenging due to poor sequence identity between these proteins.  相似文献   

4.
To study the active site(s) in protein A, partial tryptic digestions of the protein and of intact Staphylococcus aureus were performed. Fragments which bind to the Fc-part of human IgG were isolated by affinity chromatography on IgG-Sepharose 4B and purified by ion-exchange chromatography on phosphocellulose. From a partial tryptic digest of pure protein A at 30 degrees C, pH 8.2 for 30 min we have isolated and characterized six active fragments with molecular weights ranging from 6000 to 8000. Two active fragments, obtained in high yields by digestion at pH 7.2 of intact protein-A-containing bacteria, were shown to be similar to two of the six characterized fragments from the digest of pure protein A. All fragments appeared to have similar amino acid sequences, judged by peptide mapping, specific staining and amino acid analysis; some are very possibly overlapping peptides. Each fragment probably contains only one active site region since all are monovalent in the Fc-reaction when studied with a hemagglutination technique. The maximal molar yield of active fragments obtained from the digestion of pure protein A accounts for about 210% of the amount of protein A used. Thus protein A, suggested to consist of repeating units, should exhibit at least three similar if not identical active regions.  相似文献   

5.
An Y  Friesner RA 《Proteins》2002,48(2):352-366
In this work, we introduce a new method for fold recognition using composite secondary structures assembled from different secondary structure prediction servers for a given target sequence. An automatic, complete, and robust way of finding all possible combinations of predicted secondary structure segments (SSS) for the target sequence and clustering them into a few flexible clusters, each containing patterns with the same number of SSS, is developed. This program then takes two steps in choosing plausible homologues: (i) a SSS-based alignment excludes impossible templates whose SSS patterns are very different from any of those of the target; (ii) a residue-based alignment selects good structural templates based on sequence similarity and secondary structure similarity between the target and only those templates left in the first stage. The secondary structure of each residue in the target is selected from one of the predictions to find the best match with the template. Truncation is applied to a target where different predictions vary. In most cases, a target is also divided into N-terminal and C-terminal fragments, each of which is used as a separate subsequence. Our program was tested on the fold recognition targets from CASP3 with known PDB codes and some available targets from CASP4. The results are compared with a structural homologue list for each target produced by the CE program (Shindyalov and Bourne, Protein Eng 1998;11:739-747). The program successfully locates homologues with high Z-score and low root-mean-score deviation within the top 30-50 predictions in the overwhelming majority of cases.  相似文献   

6.
Fold recognition techniques assist the exploration of protein structures, and web-based servers are part of the standard set of tools used in the analysis of biochemical problems. Despite their success, current methods are only able to predict the correct fold in a relatively small number of cases. We propose an approach that improves the selection of correct folds from among the results of two methods implemented as web servers (SAMT99 and 3DPSSM). Our approach is based on the training of a system of neural networks with models generated by the servers and a set of associated characteristics such as the quality of the sequence-structure alignment, distribution of sequence features (sequence-conserved positions and apolar residues), and compactness of the resulting models. Our results show that it is possible to detect adequate folds to model 80% of the sequences with a high level of confidence. The improvements achieved by taking into account sequence characteristics open the door to future improvements by directly including such factors in the step of model generation. This approach has been implemented as an automatic system LIBELLULA, available as a public web server at http://www.pdg.cnb.uam.es/servers/libellula.html.  相似文献   

7.

Background  

Predicting the function of an unknown protein is an essential goal in bioinformatics. Sequence similarity-based approaches are widely used for function prediction; however, they are often inadequate in the absence of similar sequences or when the sequence similarity among known protein sequences is statistically weak. This study aimed to develop an accurate prediction method for identifying protein function, irrespective of sequence and structural similarities.  相似文献   

8.
The goal of this work is to characterize structurally ambivalent fragments in proteins. We have searched the Protein Data Bank and identified all structurally ambivalent peptides (SAPs) of length five or greater that exist in two different backbone conformations. The SAPs were classified in five distinct categories based on their structure. We propose a novel index that provides a quantitative measure of conformational variability of a sequence fragment. It measures the context-dependent width of the distribution of (phi,xi) dihedral angles associated with each amino acid type. This index was used to analyze the local structural propensity of both SAPs and the sequence fragments contiguous to them. We also analyzed type-specific amino acid composition, solvent accessibility, and overall structural properties of SAPs and their sequence context. We show that each type of SAP has an unusual, type-specific amino acid composition and, as a result, simultaneous intrinsic preferences for two distinct types of backbone conformation. All types of SAPs have lower sequence complexity than average. Fragments that adopt helical conformation in one protein and sheet conformation in another have the lowest sequence complexity and are sampled from a relatively limited repertoire of possible residue combinations. A statistically significant difference between two distinct conformations of the same SAP is observed not only in the overall structural properties of proteins harboring the SAP but also in the properties of its flanking regions and in the pattern of solvent accessibility. These results have implications for protein design and structure prediction.  相似文献   

9.
MOTIVATION: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequence-structure compatibility. Although these methods are complementary, their integration has not been thoroughly exploited. Statistical machine learning methods provide tools for integrating multiple features, but so far these methods have been used primarily for protein and fold classification, rather than addressing the retrieval problem of fold recognition-finding a proper template for a given query protein. RESULTS: Here we present a two-stage machine learning, information retrieval, approach to fold recognition. First, we use alignment methods to derive pairwise similarity features for query-template protein pairs. We also use global profile-profile alignments in combination with predicted secondary structure, relative solvent accessibility, contact map and beta-strand pairing to extract pairwise structural compatibility features. Second, we apply support vector machines to these features to predict the structural relevance (i.e. in the same fold or not) of the query-template pairs. For each query, the continuous relevance scores are used to rank the templates. The FOLDpro approach is modular, scalable and effective. Compared with 11 other fold recognition methods, FOLDpro yields the best results in almost all standard categories on a comprehensive benchmark dataset. Using predictions of the top-ranked template, the sensitivity is approximately 85, 56, and 27% at the family, superfamily and fold levels respectively. Using the 5 top-ranked templates, the sensitivity increases to 90, 70, and 48%.  相似文献   

10.
Physicochemcial properties of amino acids are important factors in determining protein structure and function. Most approaches make use of averaged properties over entire domains or even proteins to analyze their structure or function. This level of coarseness tends to hide the richness of the variability in the different properties across functional domains. This paper studies the conservation of physicochemical properties in a functionally similar family of proteins using a novel wavelet-based technique known as multiresolution analysis. Such an analysis can help uncover characteristics that can otherwise remain hidden. We have studied the protein kinase family of sequences and our findings are as follows: (a) a number of different properties are conserved over the functional catalytic domain irrespective of the sequence identities; (b) conservation of properties can be observed at different frequency levels and they agree well with the known structural/functional properties of the subdomains for the protein kinase family; (c) structural differences between the different kinase family members are reflected in the waveforms; and (d) functionally important mutations show distortions in the waveforms of conserved properties. The potential usefulness of the above findings in identifying functionally similar sequences in the twilight and midnight zones is demonstrated through a simple prediction model for the protein kinase family which achieved a recall of 93.7% and a precision of 96.75% in cross-validation tests.  相似文献   

11.
Combining single molecule atomic force microscopy (AFM) and protein engineering techniques, here we demonstrate that we can use recombination-based techniques to engineer novel elastomeric proteins by recombining protein fragments from structurally homologous parent proteins. Using I27 and I32 domains from the muscle protein titin as parent template proteins, we systematically shuffled the secondary structural elements of the two parent proteins and engineered 13 hybrid daughter proteins. Although I27 and I32 are highly homologous, and homology modeling predicted that the hybrid daughter proteins fold into structures that are similar to that of parent protein, we found that only eight of the 13 daughter proteins showed beta-sheet dominated structures that are similar to parent proteins, and the other five recombined proteins showed signatures of the formation of significant alpha-helical or random coil-like structure. Single molecule AFM revealed that six recombined daughter proteins are mechanically stable and exhibit mechanical properties that are different from the parent proteins. In contrast, another four of the hybrid proteins were found to be mechanically labile and unfold at forces that are lower than the approximately 20 pN, as we could not detect any unfolding force peaks. The last three hybrid proteins showed interesting duality in their mechanical unfolding behaviors. These results demonstrate the great potential of using recombination-based approaches to engineer novel elastomeric protein domains of diverse mechanical properties. Moreover, our results also revealed the challenges and complexity of developing a recombination-based approach into a laboratory-based directed evolution approach to engineer novel elastomeric proteins.  相似文献   

12.
MOTIVATION: A method for recognizing the three-dimensional fold from the protein amino acid sequence based on a combination of hidden Markov models (HMMs) and secondary structure prediction was recently developed for proteins in the Mainly-Alpha structural class. Here, this methodology is extended to Mainly-Beta and Alpha-Beta class proteins. Compared to other fold recognition methods based on HMMs, this approach is novel in that only secondary structure information is used. Each HMM is trained from known secondary structure sequences of proteins having a similar fold. Secondary structure prediction is performed for the amino acid sequence of a query protein. The predicted fold of a query protein is the fold described by the model fitting the predicted sequence the best. RESULTS: After model cross-validation, the success rate on 44 test proteins covering the three structural classes was found to be 59%. On seven fold predictions performed prior to the publication of experimental structure, the success rate was 71%. In conclusion, this approach manages to capture important information about the fold of a protein embedded in the length and arrangement of the predicted helices, strands and coils along the polypeptide chain. When a more extensive library of HMMs representing the universe of known structural families is available (work in progress), the program will allow rapid screening of genomic databases and sequence annotation when fold similarity is not detectable from the amino acid sequence. AVAILABILITY: FORESST web server at http://absalpha.dcrt.nih.gov:8008/ for the library of HMMs of structural families used in this paper. FORESST web server at http://www.tigr.org/ for a more extensive library of HMMs (work in progress). CONTACT: valedf@tigr.org; munson@helix.nih.gov; garnier@helix.nih.gov  相似文献   

13.
Proteins containing hemopexin fold domain are suggested to have diverse functions in various living organisms. In order to investigate the structure and function of this type of protein in rice plant (Oryza sativa), the gene encoding a hemopexin fold protein (OsHFP) was cloned, analyzed in silico and characterized. Molecular modeling revealed that the OsHFP is closely related to other hemopexin fold proteins, but is unique with a cylindrical central tunnel as well as extended N- and C-terminal domains. The recombinant OsHFP was found to bind hemin, the oxidized form of heme in vitro. The expression of the single copy OsHFP gene was detected in rice flower buds. Heterologous expression of OsHFP in green leaf tissues resulted in chlorophyll degradation; however, stable expression of OsHFP was observed in transgenic hairy roots, a non-green tissue. The possible role of OsHFP in regulating programmed cell death in anther green tissues of rice is proposed.  相似文献   

14.
Shepherd AJ  Gorse D  Thornton JM 《Proteins》2003,50(2):290-302
A novel method is presented for the prediction of protein architecture from sequence using neural networks. The method involves the preprocessing of protein sequence data by numerically encoding it and then applying a Fourier transform. The encoded and transformed data are then used to train a neural network to recognize a number of different protein architectures. The method proved significantly better than comparable alternative strategies such as percentage dipeptide frequency, but is still limited by the size of the data set and the input demands of a neural network. Its main potential is as a complement to existing fold recognition techniques, with its ability to identify global symmetries within protein structures its greatest strength.  相似文献   

15.
MOTIVATION: The pairwise alignment of biological sequences obtained from an algorithm will in general contain both correct and incorrect parts. Hence, to allow for a valid interpretation of the alignment, the local trustworthiness of the alignment has to be quantified. RESULTS: We present a novel approach that attributes a reliability index to every pair of residues, including gapped regions, in the optimal alignment of two protein sequences. The method is based on a fuzzy recast of the dynamic programming algorithm for sequence alignment in terms of mean field annealing. An extensive evaluation with structural reference alignments not only shows that the probability for a pair of residues to be correctly aligned grows consistently with increasing reliability index, but moreover demonstrates that the value of the reliability index can directly be translated into an estimate of the probability for a correct alignment.  相似文献   

16.
Lipid binding proteins play important roles in signaling, regulation, membrane trafficking, immune response, lipid metabolism, and transport. Because of their functional and sequence diversity, it is desirable to explore additional methods for predicting lipid binding proteins irrespective of sequence similarity. This work explores the use of support vector machines (SVMs) as such a method. SVM prediction systems are developed using 14,776 lipid binding and 133,441 nonlipid binding proteins and are evaluated by an independent set of 6,768 lipid binding and 64,761 nonlipid binding proteins. The computed prediction accuracy is 78.9, 79.5, 82.2, 79.5, 84.4, 76.6, 90.6, 79.0, and 89.9% for lipid degradation, lipid metabolism, lipid synthesis, lipid transport, lipid binding, lipopolysaccharide biosynthesis, lipoprotein, lipoyl, and all lipid binding proteins, respectively. The accuracy for the nonmember proteins of each class is 99.9, 99.2, 99.6, 99.8, 99.9, 99.8, 98.5, 99.9, and 97.0%, respectively. Comparable accuracies are obtained when homologous proteins are considered as one, or by using a different SVM kernel function. Our method predicts 86.8% of the 76 lipid binding proteins nonhomologous to any protein in the Swiss-Prot database and 89.0% of the 73 known lipid binding domains as lipid binding. These findings suggest the usefulness of SVMs for facilitating the prediction of lipid binding proteins. Our software can be accessed at the SVMProt server (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi).  相似文献   

17.
  1. Download : Download high-res image (168KB)
  2. Download : Download full-size image
  相似文献   

18.
Summary A combination of calculation and experiment is used to demonstrate that the global fold of larger proteins can be rapidly determined using limited NMR data. The approach involves a combination of heteronuclear triple resonance NMR experiments with protonation of selected residue types in an otherwise completely deuterated protein. This method of labelling produces proteins with -specific deuteration in the protonated residues, and the results suggest that this will improve the sensitivity of experiments involving correlation of side-chain (1H and 13C) and backbone (1H and 15N) amide resonances. It will allow the rapid assignment of backbone resonances with high sensitivity and the determination of a reasonable structural model of a protein based on limited NOE restraints, an application that is of increasing importance as data from the large number of genome sequencing projects accumulates. The method that we propose should also be of utility in extending the use of NMR spectroscopy to determine the structures of larger proteins.The first two authors contributed equally to this work.  相似文献   

19.
Three proteins from extremophilic bacteria—hypothetical monooxygenase from Deinococcus radiodurans, hypothetical nucleotidyl transferase from Thermotoga maritime, and hypothetical oxidoreductase from Exiguobacterium sibiricum—and the DJ-1 chaperone protein from Homo sapiens have been produced in Escherichia coli. The isolation and purification procedures developed for the recombinant proteins allowed us to achieve yields higher than 96%. Crystallization conditions enabling stable growth of crystals have been determined. X-ray experiments have been performed to test the quality of the crystals and the resolution achieved ranged from 1.2 to 1.8 Å.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号