首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Using supervised fuzzy clustering to predict protein structural classes   总被引:2,自引:0,他引:2  
Prediction of protein classification is both an important and a tempting topic in protein science. This is because of not only that the knowledge thus obtained can provide useful information about the overall structure of a query protein, but also that the practice itself can technically stimulate the development of novel predictors that may be straightforwardly applied to many other relevant areas. In this paper, a novel approach, the so-called "supervised fuzzy clustering approach" is introduced that is featured by utilizing the class label information during the training process. Based on such an approach, a set of "if-then" fuzzy rules for predicting the protein structural classes are extracted from a training dataset. It has been demonstrated through two different working datasets that the overall success prediction rates obtained by the supervised fuzzy clustering approach are all higher than those by the unsupervised fuzzy c-means introduced by the previous investigators [C.T. Zhang, K.C. Chou, G.M. Maggiora. Protein Eng. (1995) 8, 425-435]. It is anticipated that the current predictor may play an important complementary role to other existing predictors in this area to further strengthen the power in predicting the structural classes of proteins and their other characteristic attributes.  相似文献   

2.
Prediction of protein classification is an important topic in molecular biology. This is because it is able to not only provide useful information from the viewpoint of structure itself, but also greatly stimulate the characterization of many other features of proteins that may be closely correlated with their biological functions. In this paper, the LogitBoost, one of the boosting algorithms developed recently, is introduced for predicting protein structural classes. It performs classification using a regression scheme as the base learner, which can handle multi-class problems and is particularly superior in coping with noisy data. It was demonstrated that the LogitBoost outperformed the support vector machines in predicting the structural classes for a given dataset, indicating that the new classifier is very promising. It is anticipated that the power in predicting protein structural classes as well as many other bio-macromolecular attributes will be further strengthened if the LogitBoost and some other existing algorithms can be effectively complemented with each other.  相似文献   

3.
Li ZC  Zhou XB  Dai Z  Zou XY 《Amino acids》2009,37(2):415-425
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou’s pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246–255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.  相似文献   

4.
Knowledge of protein structural class can provide important information about its folding patterns. Many approaches have been developed for the prediction of protein structural classes. However, the information used by these approaches is primarily based on amino acid sequences. In this study, a novel method is presented to predict protein structural classes by use of chemical shift (CS) information derived from nuclear magnetic resonance spectra. Firstly, 399 non-homologue (about 15% identity) proteins were constructed to investigate the distribution of averaged CS values of six nuclei ((13)CO, (13)Cα, (13)Cβ, (1)HN, (1)Hα and (15)N) in three protein structural classes. Subsequently, support vector machine was proposed to predict three protein structural classes by using averaged CS information of six nuclei. Overall accuracy of jackknife cross-validation achieves 87.0%. Finally, the feature selection technique is applied to exclude redundant information and find out an optimized feature set. Results show that the overall accuracy increased to 88.0% by using the averaged CSs of (13)CO, (1)Hα and (15)N. The proposed approach outperformed other state-of-the-art methods in terms of predictive accuracy in particular for low-similarity protein data. We expect that our proposed approach will be an excellent alternative to traditional methods for protein structural class prediction.  相似文献   

5.
6.
Terwilliger TC  Berendzen J 《Genetica》1999,106(1-2):141-147
The genome projects are changing biology by providing the genetic blueprints of entire organisms. The blueprints are tantalizing but we cannot deduce everything we need to know from them, including the structures and detailed functions of proteins. In this paper we describe an approach for obtaining structural information about proteins on a genomic scale. We describe how structural and functional information might eventually be put together to form a basis for describing life at many levels. We then describe how structural information fits into this picture and classes of proteins for which structural information would be useful in a genomic context. We conclude with a proposal for an initiative to determine protein structures on a very large scale.This revised version was published online in October 2005 with corrections to the Cover Date.  相似文献   

7.
Protein-protein interactions (PPIs) play an important role in biological processes. Although much effort has been devoted to the identification of novel PPIs by integrating experimental biological knowledge, there are still many difficulties because of lacking enough protein structural and functional information. It is highly desired to develop methods based only on amino acid sequences for predicting PPIs. However, sequence-based predictors are often struggling with the high-dimensionality causing over-fitting and high computational complexity problems, as well as the redundancy of sequential feature vectors. In this paper, a novel computational approach based on compressed sensing theory is proposed to predict yeast Saccharomyces cerevisiae PPIs from primary sequence and has achieved promising results. The key advantage of the proposed compressed sensing algorithm is that it can compress the original high-dimensional protein sequential feature vector into a much lower but more condensed space taking the sparsity property of the original signal into account. What makes compressed sensing much more attractive in protein sequence analysis is its compressed signal can be reconstructed from far fewer measurements than what is usually considered necessary in traditional Nyquist sampling theory. Experimental results demonstrate that proposed compressed sensing method is powerful for analyzing noisy biological data and reducing redundancy in feature vectors. The proposed method represents a new strategy of dealing with high-dimensional protein discrete model and has great potentiality to be extended to deal with many other complicated biological systems.  相似文献   

8.
Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE''s ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening.  相似文献   

9.
10.
Correlations of amino acids in proteins   总被引:2,自引:0,他引:2  
Du Q  Wei D  Chou KC 《Peptides》2003,24(12):1863-1869
A correlation analysis among 20 amino acids is performed for four protein structural classes (, β, /β, and +β) in a total of 204 proteins. The correlation relationships among amino acids can be classified into the following four types: (1) strong positive correlation, (2) strong negative correlation, (3) weak correlation, and (4) no correlation. The correlation relationships are different for different proteins and are correlated with the features of their structural classes. The amino acids with the weak correlation relationship can be treated as the independent basis functions for the space where proteins are defined. The amino acids with large correlation coefficients are linear correlative with each other and they are not independent. The strong correlation among amino acids reflects their mutual constrained relationship, as exhibited by their relevant structural features. The information obtained through the correlation analysis is used for predicting protein structural classes and a better prediction quality is obtained than that by the simple geometry distance methods without taking into account the correlation effects.  相似文献   

11.
Protein recognition is one of the most challenging and intriguing problems in structural biology. Despite all the available structural, sequence and biophysical information about protein-protein complexes, the physico-chemical patterns, if any, that make a protein surface likely to be involved in protein-protein interactions, remain elusive. Here, we apply protein docking simulations and analysis of the interaction energy landscapes to identify protein-protein interaction sites. The new protocol for global docking based on multi-start global energy optimization of an all-atom model of the ligand, with detailed receptor potentials and atomic solvation parameters optimized in a training set of 24 complexes, explores the conformational space around the whole receptor without restrictions. The ensembles of the rigid-body docking solutions generated by the simulations were subsequently used to project the docking energy landscapes onto the protein surfaces. We found that highly populated low-energy regions consistently corresponded to actual binding sites. The procedure was validated on a test set of 21 known protein-protein complexes not used in the training set. As much as 81% of the predicted high-propensity patch residues were located correctly in the native interfaces. This approach can guide the design of mutations on the surfaces of proteins, provide geometrical details of a possible interaction, and help to annotate protein surfaces in structural proteomics.  相似文献   

12.
SLLE for predicting membrane protein types   总被引:2,自引:0,他引:2  
Introduction of the concept of pseudo amino acid composition (PROTEINS: Structure, Function, and Genetics 43 (2001) 246; Erratum: ibid. 44 (2001) 60) has made it possible to incorporate a considerable amount of sequence-order effects by representing a protein sample in terms of a set of discrete numbers, and hence can significantly enhance the prediction quality of membrane protein type. As a continuous effort along such a line, the Supervised Locally Linear Embedding (SLLE) technique for nonlinear dimensionality reduction is introduced (Science 22 (2000) 2323). The advantage of using SLLE is that it can reduce the operational space by extracting the essential features from the high-dimensional pseudo amino acid composition space, and that the cluster-tolerant capacity can be increased accordingly. As a consequence by combining these two approaches, high success rates have been observed during the tests of self-consistency, jackknife and independent data set, respectively, by using the simplest nearest neighbour classifier. The current approach represents a new strategy to deal with the problems of protein attribute prediction, and hence may become a useful vehicle in the area of bioinformatics and proteomics.  相似文献   

13.
Cell membranes are vitally important to the life of a cell. Although the basic structure of biological membrane is provided by the lipid bilayer, membrane proteins perform most of the specific functions. Membrane proteins are putatively classified into five different types. Identification of their types is currently an important topic in bioinformatics and proteomics. In this paper, based on the concept of representing protein samples in terms of their pseudo-amino acid composition (Chou, K.C., 2001. Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct. Funct. Genet. 43, 246-255), the fuzzy K-nearest neighbors (KNN) algorithm has been introduced to predict membrane protein types, and high success rates were observed. It is anticipated that, the current approach, which is based on a branch of fuzzy mathematics and represents a new strategy, may play an important complementary role to the existing methods in this area. The novel approach may also have notable impact on prediction of the other attributes, such as protein structural class, protein subcellular localization, and enzyme family class, among many others.  相似文献   

14.
Deciphering the native conformation of proteins from their amino acid sequences is one of the most challenging problems in molecular biology. Information on the secondary structure of a protein can be helpful in understanding its native folded state. In our earlier work on molecular chaperones, we have analyzed the hydrophobic and charged patches, short-, medium- and long-range contacts and residue distributions along the sequence. In this article, we have made an attempt to predict the structural class of globular and chaperone proteins based on the information obtained from residue distributions. This method predicts the structural class with an accuracy of 93 and 96%, respectively, for the four- and three-state models in a training set of 120 globular proteins, and 90 and 96%, respectively, for a test set of 80 proteins. We have used this information and methodology to predict the structural classes of chaperones. Interestingly most of the chaperone proteins are predicted under alpha/beta or mixed folding type.  相似文献   

15.
Our understanding of how steroid hormones regulate physiological functions has been significantly advanced by structural biology approaches. However, progress has been hampered by misfolding of the ligand binding domains in heterologous expression systems and by conformational flexibility that interferes with crystallization. Here, we show that protein folding problems that are common to steroid hormone receptors are circumvented by mutations that stabilize well-characterized conformations of the receptor. We use this approach to present the structure of an apo steroid receptor that reveals a ligand-accessible channel allowing soaking of preformed crystals. Furthermore, crystallization of different pharmacological classes of compounds allowed us to define the structural basis of NFkappaB-selective signaling through the estrogen receptor, thus revealing a unique conformation of the receptor that allows selective suppression of inflammatory gene expression. The ability to crystallize many receptor-ligand complexes with distinct pharmacophores allows one to define structural features of signaling specificity that would not be apparent in a single structure.  相似文献   

16.
Detection of similarity is particularly difficult for small proteins and thus connections between many of them remain unnoticed. Structure and sequence analysis of several metal-binding proteins reveals unexpected similarities in structural domains classified as different protein folds in SCOP and suggests unification of seven folds that belong to two protein classes. The common motif, termed treble clef finger in this study, forms the protein structural core and is 25-45 residues long. The treble clef motif is assembled around the central zinc ion and consists of a zinc knuckle, loop, beta-hairpin and an alpha-helix. The knuckle and the first turn of the helix each incorporate two zinc ligands. Treble clef domains constitute the core of many structures such as ribosomal proteins L24E and S14, RING fingers, protein kinase cysteine-rich domains, nuclear receptor-like fingers, LIM domains, phosphatidylinositol-3-phosphate-binding domains and His-Me finger endonucleases. The treble clef finger is a uniquely versatile motif adaptable for various functions. This small domain with a 25 residue structural core can accommodate eight different metal-binding sites and can have many types of functions from binding of nucleic acids, proteins and small molecules, to catalysis of phosphodiester bond hydrolysis. Treble clef motifs are frequently incorporated in larger structures or occur in doublets. Present analysis suggests that the treble clef motif defines a distinct structural fold found in proteins with diverse functional properties and forms one of the major zinc finger groups.  相似文献   

17.
《Ecological Informatics》2007,2(2):128-137
Assessment of ecosystem viability is an important requirement for conservation planning. Valuable ecosystems, which are less viable against external non-natural pressures, deserve more protection. Such an assessment is a multiple attribute analysis process due to involvement of several decision criteria. The problem in such a process is the fact that the required information is not always precisely defined. In addition, the interactions of an ecosystem with the other surrounding ecosystems and with the external non-natural disturbance cannot be effectively expressed via simple quantified indicators. In this sense, assessment of viability by qualitative terms provides an opportunity to utilize experts’ judgments to comprehensively address the viability attributes, and in particular, interactions of ecosystems with each other and with the external disturbance and pressure factors. In doing so, in this study, a methodology for viability assessment of ecosystems is proposed based on joint consideration of theory of multiple attribute analysis and fuzzy set theory to deal with qualitative and imprecise information. A novel approach on the basis of conjunction implication method is constructed capable of considering fuzziness in both partial scores and weights. The method will be examined using three classes of ordering approaches proposed in literature. To illustrate the usefulness and the applicability of the proposed approach, it will be employed in a case study for viability assessment of ecosystems within the reach area of an oil–sand mining project.  相似文献   

18.
We have developed a method to determine the three-dimensional structure of a protein molecule from such a set of distance constraints as can be determined by nuclear magnetic resonance studies. The currently popular methods for distance geometry based on the use of the metric matrix are applicable only to small systems. The method developed here is applicable to large molecules, such as proteins, with all atoms treated explicitly. This method works in the space of variable dihedral angles and determines a three-dimensional structure by minimization of a target function. We avoid difficulties hitherto inherent in this type of approach by two new devices: the use of variable target functions; and a method of rapid calculation of the gradient of the target functions. The method is applied to the determination of the structures of a small globular protein, bovine pancreatic trypsin inhibitor, from several artificial sets of distance constraints extracted from the X-ray crystal structure of this molecule. When a good set of constraints was available for both short- and long-range distances, the crystal structure was regenerated nearly exactly. When some ambiguities, such as those expected in experimental information, are allowed, the protein conformation can be determined up to a few local deformations. These ambiguities are mainly associated with the low resolving power of the short-range information.  相似文献   

19.
Functional classification of proteins from sequences alone has become a critical bottleneck in understanding the myriad of protein sequences that accumulate in our databases. The great diversity of homologous sequences hides, in many cases, a variety of functional activities that cannot be anticipated. Their identification appears critical for a fundamental understanding of the evolution of living organisms and for biotechnological applications. ProfileView is a sequence-based computational method, designed to functionally classify sets of homologous sequences. It relies on two main ideas: the use of multiple profile models whose construction explores evolutionary information in available databases, and a novel definition of a representation space in which to analyze sequences with multiple profile models combined together. ProfileView classifies protein families by enriching known functional groups with new sequences and discovering new groups and subgroups. We validate ProfileView on seven classes of widespread proteins involved in the interaction with nucleic acids, amino acids and small molecules, and in a large variety of functions and enzymatic reactions. ProfileView agrees with the large set of functional data collected for these proteins from the literature regarding the organization into functional subgroups and residues that characterize the functions. In addition, ProfileView resolves undefined functional classifications and extracts the molecular determinants underlying protein functional diversity, showing its potential to select sequences towards accurate experimental design and discovery of novel biological functions. On protein families with complex domain architecture, ProfileView functional classification reconciles domain combinations, unlike phylogenetic reconstruction. ProfileView proves to outperform the functional classification approach PANTHER, the two k-mer-based methods CUPP and eCAMI and a neural network approach based on Restricted Boltzmann Machines. It overcomes time complexity limitations of the latter.  相似文献   

20.

Background  

Soon after the first algorithms for RNA folding became available, it was recognised that the prediction of only one energetically optimal structure is insufficient to achieve reliable results. An in-depth analysis of the folding space as a whole appeared necessary to deduce the structural properties of a given RNA molecule reliably. Folding space analysis comprises various methods such as suboptimal folding, computation of base pair probabilities, sampling procedures and abstract shape analysis. Common to many approaches is the idea of partitioning the folding space into classes of structures, for which certain properties can be derived.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号