首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the past, a large number of methods have been developed for predicting various characteristics of a protein from its composition. In order to exploit the full potential of protein composition, we developed the web-server COPid to assist the researchers in annotating the function of a protein from its composition using whole or part of the protein. COPid has three modules called search, composition and analysis. The search module allows searching of protein sequences in six different databases. Search results list database proteins in ascending order of Euclidian distance or descending order of compositional similarity with the query sequence. The composition module allows calculation of the composition of a sequence and average composition of a group of sequences. The composition module also allows computing composition of various types of amino acids (e.g. charge, polar, hydrophobic residues). The analysis module provides the following options; i) comparing composition of two classes of proteins, ii) creating a phylogenetic tree based on the composition and iii) generating input patterns for machine learning techniques. We have evaluated the performance of composition-based (or alignment-free) similarity search in the subcellular localization of proteins. It was found that the alignment free method performs reasonably well in predicting certain classes of proteins. The COPid web-server is available at http://www.imtech.res.in/raghava/copid/.  相似文献   

2.
MOTIVATION: Multiple alignments of proteins are an effective way of identifying conserved amino acids that provide clues to functional relationships among proteins. Quantitation of the abundances of amino acids found at each position in a sequence motif can provide a basis for understanding the structural and functional constraints at each point. Distribution of information across a motif has been used previously, but the non-intuitive nature of the analysis has limited its impact. RESULTS: Here, we introduce a quantitative measure of amino acid sequence diversity (DIVAA) that has a simple, intuitive meaning. Diversity, as a measure of sequence conservation or variation, is inextricably linked to the probability of selecting identical pairs from a distribution. We demonstrate its utility through the analysis of four populations: ATP-binding P-loops, hypervariable domains of kappa light chains, signal sequences, and the N- and C- termini of proteins. DIVAA provides a simple means to generate hypotheses concerning the contribution of individual residues to the functional and evolutionary relationships among proteins. AVAILABILITY: Access to DIVAA software is available at RELIC (http://relic.bio.anl.gov).  相似文献   

3.
Lise S  Jones DT 《Proteins》2005,58(1):144-150
The relationship between amino acid sequence and intrinsic disorder in proteins is investigated. Two databases, one of disordered proteins and the other of globular proteins, are analyzed and compared in order to extract simple sequence patterns of a few amino acids or amino acid properties that characterize disordered segments. It is found that a number of reliable, nonrandom associations exists. In particular, two types of patterns appear to be recurrent: a proline-rich pattern and a (positively or negatively) charged pattern. These results indicate that local sequence information can determine disordered regions in proteins. The derived patterns provide some insights into the physical reasons for disordered structures. They should also be helpful in improving currently available prediction methods.  相似文献   

4.
5.
The Golgi apparatus is an important eukaryotic organelle. Successful prediction of Golgi protein types can provide valuable information for elucidating protein functions involved in various biological processes. In this work, a method is proposed by combining a special mode of pseudo amino acid composition (increment of diversity) with the modified Mahalanobis discriminant for predicting Golgi protein types. The benchmark dataset used to train the predictor thus formed contains 95 Golgi proteins in which none of proteins included has ≥40% pairwise sequence identity to any other. The accuracy obtained by the jackknife test was 74.7%, with the ROC curve of 0.772 in identifying cis-Golgi proteins and trans-Golgi proteins. Subsequently, the method was extended to discriminate cis-Golgi network proteins from cis-Golgi network membrane proteins and trans-Golgi network proteins from trans-Golgi network membrane proteins, respectively. The accuracies thus obtained were 76.1% and 83.7%, respectively. These results indicate that our method may become a useful tool in the relevant areas. As a user-friendly web-server, the predictor is freely accessible at http://immunet.cn/SubGolgi/.  相似文献   

6.
Lin WZ  Fang JA  Xiao X  Chou KC 《PloS one》2011,6(9):e24756
DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the "grey model" and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has ≥25% pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results.  相似文献   

7.
8.
All the protein sequences from SWISS-PROT database were analyzed for occurrence of single amino acid repeats, tandem oligo-peptide repeats, and periodically conserved amino acids. Single amino acid repeats of glutamine, serine, glutamic acid, glycine, and alanine seem to be tolerated to a considerable extent in many proteins. Tandem oligo-peptide repeats of different types with varying levels of conservation were detected in several proteins and found to be conspicuous, particularly in structural and cell surface proteins. It appears that repeated sequence patterns may be a mechanism that provides regular arrays of spatial and functional groups, useful for structural packing or for one to one interactions with target molecules. To facilitate further explorations, a database of Tandem Repeats in Protein Sequences (TRIPS) has been developed and is available at URL: http://www.ncl-india.org/trips.  相似文献   

9.
G protein-coupled receptors (GPCRs) are among the most frequent targets of therapeutic drugs. With the avalanche of newly generated protein sequences in the post genomic age, to expedite the process of drug discovery, it is highly desirable to develop an automated method to rapidly identify GPCRs and their types. A new predictor was developed by hybridizing two different modes of pseudo-amino acid composition (PseAAC): the functional domain PseAAC and the low-frequency Fourier spectrum PseAAC. The new predictor is called GPCR-2L, where "2L" means that it is a two-layer predictor: the 1st layer prediction engine is to identify a query protein as GPCR or not; if it is, the prediction will be automatically continued to further identify it as belonging to one of the following six types: (1) rhodopsin-like (Class A), (2) secretin-like (Class B), (3) metabotropic glutamate/pheromone (Class C), (4) fungal pheromone (Class D), (5) cAMP receptor (Class E), or (6) frizzled/smoothened family (Class F). The overall success rate of GPCR-2L in identifying proteins as GPCRs or non-GPCRs is over 97.2%, while identifying GPCRs among their six types is over 97.8%. Such high success rates were derived by the rigorous jackknife cross-validation on a stringent benchmark dataset, in which none of the included proteins had ≥40% pairwise sequence identity to any other protein in a same subset. As a user-friendly web-server, GPCR-2L is freely accessible to the public at http://icpr.jci.edu.cn/, by which one can obtain the 2-level results in about 20 s for a query protein sequence of 500 amino acids. The longer the sequence is, the more time it may usually need. The high success rates reported here indicate that it is a quite effective approach to identify GPCRs and their types with the functional domain information and the low-frequency Fourier spectrum analysis. It is anticipated that GPCR-2L may become a useful tool for both basic research and drug development in the areas related to GPCRs.  相似文献   

10.
Isolation and partial characterization of human parotid basic proteins   总被引:3,自引:0,他引:3  
Methods are presented for the isolation of basic proteins (Pb proteins) from human parotid saliva collected from humans possessing different alleles at the Pb locus. The proteins were found to be extremely basic, with an isoelectric point above 9.5. They contain approximately 45% of the basic amino acids histidine, lysine, and arginine, and are devoid of cysteine, proline, threonine, valine, methionine, and tryptophan. They are free of carbohydrate. A comparison of the amino acid sequence data of Pb protein to all available amino acid sequences revealed that no sequence similarities exist between the Pb proteins and any other proteins reported, although proteins of similar amino acid compositions have been reported by others. A model is presented with accounts for the several forms of allelic proteins based on observed amino acid sequence differences.  相似文献   

11.
Human cytochrome P450(CYP 450) enzymes mediate over 60% of the phase I-dependent metabolism of clinical drugs. They are also known for the polymorphism functions that have significant impacts on the enzyme activities. In this study, a web-server called SCYPPred was developed for predicting human cytochrome P450 SNPs (Single Nucleotide Polymorphisms) based on the SVM flanking sequence method; SCYPPred can rapidly yield the desired results by using the amino acid sequences information alone. The web-server is accessible to the public at http://snppred.sjtu.edu.cn. Hopefully SCYPPred could be a useful bioinformatics tool for elucidating the mutation probability of a specific CYP450 enzyme.  相似文献   

12.
13.
Helix kinks are a common feature of α‐helical membrane proteins, but are thought to be rare in soluble proteins. In this study we find that kinks are a feature of long α‐helices in both soluble and membrane proteins, rather than just transmembrane α‐helices. The apparent rarity of kinks in soluble proteins is due to the relative infrequency of long helices (≥20 residues) in these proteins. We compare length‐matched sets of soluble and membrane helices, and find that the frequency of kinks, the role of Proline, the patterns of other amino acid around kinks (allowing for the expected differences in amino acid distributions between the two types of protein), and the effects of hydrogen bonds are the same for the two types of helices. In both types of protein, helices that contain Proline in the second and subsequent turns are very frequently kinked. However, there are a sizeable proportion of kinked helices that do not contain a Proline in either their sequence or sequence homolog. Moreover, we observe that in soluble proteins, kinked helices have a structural preference in that they typically point into the solvent. Proteins 2014; 82:1960–1970. © 2014 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.  相似文献   

14.
By introducing the "multi-layer scale", as well as hybridizing the information of gene ontology and the sequential evolution information, a novel predictor, called iLoc-Gpos, has been developed for predicting the subcellular localization of Gram positive bacterial proteins with both single-location and multiple-location sites. For facilitating comparison, the same stringent benchmark dataset used to estimate the accuracy of Gpos-mPLoc was adopted to demonstrate the power of iLoc-Gpos. The dataset contains 519 Gram-positive bacterial proteins classified into the following four subcellular locations: (1) cell membrane, (2) cell wall, (3) cytoplasm, and (4) extracell; none of proteins included has ≥25% pairwise sequence identity to any other in a same subset (subcellular location). The overall success rate by jackknife test on such a stringent benchmark dataset by iLoc-Gpos was over 93%, which is about 11% higher than that by GposmPLoc. As a user-friendly web-server, iLoc-Gpos is freely accessible to the public at http://icpr.jci.edu.cn/bioinfo/iLoc- Gpos or http://www.jci-bioinfo.cn/iLoc-Gpos. Meanwhile, a step-by-step guide is provided on how to use the web-server to get the desired results. Furthermore, for the user ? s convenience, the iLoc-Gpos web-server also has the function to accept the batch job submission, which is not available in the existing version of Gpos-mPLoc web-server.  相似文献   

15.

Background

The function of a protein can be deciphered with higher accuracy from its structure than from its amino acid sequence. Due to the huge gap in the available protein sequence and structural space, tools that can generate functionally homogeneous clusters using only the sequence information, hold great importance. For this, traditional alignment-based tools work well in most cases and clustering is performed on the basis of sequence similarity. But, in the case of multi-domain proteins, the alignment quality might be poor due to varied lengths of the proteins, domain shuffling or circular permutations. Multi-domain proteins are ubiquitous in nature, hence alignment-free tools, which overcome the shortcomings of alignment-based protein comparison methods, are required. Further, existing tools classify proteins using only domain-level information and hence miss out on the information encoded in the tethered regions or accessory domains. Our method, on the other hand, takes into account the full-length sequence of a protein, consolidating the complete sequence information to understand a given protein better.

Results

Our web-server, CLAP (Classification of Proteins), is one such alignment-free software for automatic classification of protein sequences. It utilizes a pattern-matching algorithm that assigns local matching scores (LMS) to residues that are a part of the matched patterns between two sequences being compared. CLAP works on full-length sequences and does not require prior domain definitions.Pilot studies undertaken previously on protein kinases and immunoglobulins have shown that CLAP yields clusters, which have high functional and domain architectural similarity. Moreover, parsing at a statistically determined cut-off resulted in clusters that corroborated with the sub-family level classification of that particular domain family.

Conclusions

CLAP is a useful protein-clustering tool, independent of domain assignment, domain order, sequence length and domain diversity. Our method can be used for any set of protein sequences, yielding functionally relevant clusters with high domain architectural homogeneity. The CLAP web server is freely available for academic use at http://nslab.mbu.iisc.ernet.in/clap/.  相似文献   

16.
MOTIVATION: Conformational flexibility is essential to the function of many proteins, e.g. catalytic activity. To assist efforts in determining and exploring the functional properties of a protein, it is desirable to automatically identify regions that are prone to undergo conformational changes. It was recently shown that a probabilistic predictor of continuum secondary structure is more accurate than categorical predictors for structurally ambivalent sequence regions, suggesting that such models are suited to characterize protein flexibility. RESULTS: We develop a computational method for identifying regions that are prone to conformational change directly from the amino acid sequence. The method uses the entropy of the probabilistic output of an 8-class continuum secondary structure predictor. Results for 171 unique amino acid sequences with well-characterized variable structure (identified in the 'Macromolecular movements database') indicate that the method is highly sensitive at identifying flexible protein regions, but false positives remain a problem. The method can be used to explore conformational flexibility of proteins (including hypothetical or synthetic ones) whose structure is yet to be determined experimentally. AVAILABILITY: The predictor, sequence data and supplementary studies are available at http://pprowler.itee.uq.edu.au/sspred/ and are free for academic use.  相似文献   

17.
Using a previously developed automated method for enzyme annotation, we report the re-annotation of the ENZYME database and the analysis of local error rates per class. In control experiments, we demonstrate that the method is able to correctly re-annotate 91% of all Enzyme Classification (EC) classes with high coverage (755 out of 827). Only 44 enzyme classes are found to contain false positives, while the remaining 28 enzyme classes are not represented. We also show cases where the re-annotation procedure results in partial overlaps for those few enzyme classes where a certain inconsistency might appear between homologous proteins, mostly due to function specificity. Our results allow the interactive exploration of the EC hierarchy for known enzyme families as well as putative enzyme sequences that may need to be classified within the EC hierarchy. These aspects of our framework have been incorporated into a web-server, called CORRIE, which stands for Correspondence Indicator Estimation and allows the interactive prediction of a functional class for putative enzymes from sequence alone, supported by probabilistic measures in the context of the pre-calculated Correspondence Indicators of known enzymes with the functional classes of the EC hierarchy. The CORRIE server is available at: http://www.genomes.org/services/corrie/.  相似文献   

18.
Using the data from Protein Data Bank the correlations of primary and secondary structures of proteins were analyzed. The correlation values of the amino acids and the eight secondary structure types were calculated, where the position of the amino acid and the position in sequence with the particular secondary structure differ at most 25. The diagrams describing these results indicate that correlations are significant at distances between −9 and 10. The results show that the substituents on Cβ or Cγ atoms of amino acid play major role in their preference for particular secondary structure at the same position in the sequence, while the polarity of amino acid has significant influence on α-helices and strands at some distance in the sequence. The diagrams corresponding to polar amino acids are noticeably asymmetric. The diagrams point out the exchangeability of residues in the proteins; the amino acids with similar diagrams have similar local folding requirements. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

19.
Membrane proteins are vitally important for many biological processes and have become an attractive target for both basic research and drug design. Knowledge of membrane protein types often provides useful clues in deducing the functions of uncharacterized membrane proteins. With the unprecedented increasing of newly found protein sequences in the post-genomic era, it is highly demanded to develop an automated method for fast and accurately identifying the types of membrane proteins according to their amino acid sequences. Although quite a few identifiers have been developed in this regard through various approaches, such as covariant discriminant (CD), support vector machine (SVM), artificial neural network (ANN), and K-nearest neighbor (KNN), classifier the way they operate the identification is basically individual. As is well known, wise persons usually take into account the opinions from several experts rather than rely on only one when they are making critical decisions. Likewise, a sophisticated identifier should be trained by several different modes. In view of this, based on the frame of pseudo-amino acid that can incorporate a considerable amount of sequence-order effects, a novel approach called "stacked generalization" or "stacking" has been introduced. Unlike the "bagging" and "boosting" approaches which only combine the classifiers of a same type, the stacking approach can combine several different types of classifiers through a meta-classifier to maximize the generalization accuracy. The results thus obtained were very encouraging. It is anticipated that the stacking approach may also hold a high potential to improve the identification quality for, among many other protein attributes, subcellular location, enzyme family class, protease type, and protein-protein interaction type. The stacked generalization classifier is available as a web-server named "SG-MPt_Pred" at: http://202.120.37.186/bioinf/wangsq/service.htm.  相似文献   

20.
Multiple variability in the sequence of a family of maize endosperm proteins   总被引:10,自引:0,他引:10  
A collection of cDNA clones, corresponding to a group of maize endosperm proteins classified in the glutelin-2 (or reduced soluble proteins) and in the zein-2 subfractions, has been identified and characterized. The nucleotide sequence of three of these clones has been obtained and the amino acid sequence deduced. They appear to correspond to a small family of genes that are specifically expressed in immature endosperm simultaneously to zeins, the best characterized proteins from this tissue. Unlike zeins, the proteins of the glutelin-2 and zein-2 family contain sequences homologous to storage proteins from other cereals such as gliadins or hordeins. The cDNA clones encoding for the two types of proteins have been compared, and a high degree of homology has been observed for both the nucleotide and amino acid sequences. The differences existing in both the coding and non-coding regions allow the definition of multiple types of variability in their sequence. An hypothesis is proposed on how sequence diversity may have been generated in this particular class of plant proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号