首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
2.
Automated discovery of 3D motifs for protein function annotation   总被引:2,自引:0,他引:2  
MOTIVATION: Function inference from structure is facilitated by the use of patterns of residues (3D motifs), normally identified by expert knowledge, that correlate with function. As an alternative to often limited expert knowledge, we use machine-learning techniques to identify patterns of 3-10 residues that maximize function prediction. This approach allows us to test the assumption that residues that provide function are the most informative for predicting function. RESULTS: We apply our method, GASPS, to the haloacid dehalogenase, enolase, amidohydrolase and crotonase superfamilies and to the serine proteases. The motifs found by GASPS are as good at function prediction as 3D motifs based on expert knowledge. The GASPS motifs with the greatest ability to predict protein function consist mainly of known functional residues. However, several residues with no known functional role are equally predictive. For four groups, we show that the predictive power of our 3D motifs is comparable with or better than approaches that use the entire fold (Combinatorial-Extension) or sequence profiles (PSI-BLAST). AVAILABILITY: Source code is freely available for academic use by contacting the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

3.
This paper presents a novel algorithm for the discovery of biological sequence motifs. Our motivation is the prediction of gene function. We seek to discover motifs and combinations of motifs in the secondary structure of proteins for application to the understanding and prediction of functional classes. The motifs found by our algorithm allow both flexible length structural elements and flexible length gaps and can be of arbitrary length. The algorithm is based on neither top-down nor bottom-up search, but rather is dichotomic. It is also "anytime," so that fixed termination of the search is not necessary. We have applied our algorithm to yeast sequence data to discover rules predicting function classes from secondary structure. These resultant rules are informative, consistent with known biology, and a contribution to scientific knowledge. Surprisingly, the rules also demonstrate that secondary structure prediction algorithms are effective for membrane proteins and suggest that the association between secondary structure and function is stronger in membrane proteins than globular ones. We demonstrate that our algorithm can successfully predict gene function directly from predicted secondary structure; e.g., we correctly predict the gene YGL124c to be involved in the functional class "cytoplasmic and nuclear degradation." Datasets and detailed results (generated motifs, rules, evaluation on test dataset, and predictions on unknown dataset) are available at www.aber.ac.uk/compsci/Research/bio/dss/yeast.ss.mips/, and www.genepredictions.org.  相似文献   

4.
Diuron, a chlorine-substituted dimethyl herbicide, is widely used in agriculture. Though the degradation of diuron in water has been studied much with experiments, little is known about the detailed degradation mechanism from the molecular level. In this work, the degradation mechanisms for OH-induced reactions of diuron in water phase are investigated at the MPWB1K/6–311+G(3df,2p)//MPWB1K/6–31+G(d,p) level with polarizable continuum model (PCM) calculation. Three reaction types including H-atom abstraction, addition, and substitution are identified. For H-atom abstraction reactions, the calculation results show that the reaction abstracting H atom from the methyl group has the lowest energy barrier; the potential barrier of ortho- H (H1’) abstraction is higher than the meta- H abstraction, and the reason is possibly that part of the potential energy is to overcome the side chain torsion for the H1’ abstraction reaction. For addition pathways, the ortho- site (C (2) atom) is the most favorable site that OH may first attack; the potential barriers for OH additions to the ortho- sites (pathways R7 and R8) and the chloro-substituted para- site (R10) are lower than other sites, indicating the ortho- and para- sites are more favorable to be attacked, matching well with the -NHCO- group as an ortho-para directing group.
Figure
Representative pathways including abstraction, addition and substitution for OH and diuron reactions  相似文献   

5.
Protein structure prediction (PSP) is a significant area for biological information research, disease treatment, and drug development and so on. In this paper, three-dimensional structures of proteins are predicted based on the known amino acid sequences, and the structure prediction problem is transformed into a typical NP problem by an AB off-lattice model. This work applies a novel improved Stochastic Fractal Search algorithm (ISFS) to solve the problem. The Stochastic Fractal Search algorithm (SFS) is an effective evolutionary algorithm that performs well in exploring the search space but falls into local minimums sometimes. In order to avoid the weakness, Lvy flight and internal feedback information are introduced in ISFS. In the experimental process, simulations are conducted by ISFS algorithm on Fibonacci sequences and real peptide sequences. Experimental results prove that the ISFS performs more efficiently and robust in terms of finding the global minimum and avoiding getting stuck in local minimums.  相似文献   

6.
MOTIVATION: The detection of function-related local 3D-motifs in protein structures can provide insights towards protein function in absence of sequence or fold similarity. Protein loops are known to play important roles in protein function and several loop classifications have been described, but the automated identification of putative functional 3D-motifs in such classifications has not yet been addressed. This identification can be used on sequence annotations. RESULTS: We evaluated three different scoring methods for their ability to identify known motifs from the PROSITE database in ArchDB. More than 500 new putative function-related motifs not reported in PROSITE were identified. Sequence patterns derived from these motifs were especially useful at predicting precise annotations. The number of reliable sequence annotations could be increased up to 100% with respect to standard BLAST. CONTACT: boliva@imim.es SUPPLEMENTARY INFORMATION: Supplementary Data are available at Bioinformatics online.  相似文献   

7.

Background

Large amounts of data are being generated by high-throughput genome sequencing methods. But the rate of the experimental functional characterization falls far behind. To fill the gap between the number of sequences and their annotations, fast and accurate automated annotation methods are required. Many methods, such as GOblet, GOFigure, and Gotcha, are designed based on the BLAST search. Unfortunately, the sequence coverage of these methods is low as they cannot detect the remote homologues. Adding to this, the lack of annotation specificity advocates the need to improve automated protein function prediction.

Results

We designed a novel automated protein functional assignment method based on the neural response algorithm, which simulates the neuronal behavior of the visual cortex in the human brain. Firstly, we predict the most similar target protein for a given query protein and thereby assign its GO term to the query sequence. When assessed on test set, our method ranked the actual leaf GO term among the top 5 probable GO terms with accuracy of 86.93%.

Conclusions

The proposed algorithm is the first instance of neural response algorithm being used in the biological domain. The use of HMM profiles along with the secondary structure information to define the neural response gives our method an edge over other available methods on annotation accuracy. Results of the 5-fold cross validation and the comparison with PFP and FFPred servers indicate the prominent performance by our method. The program, the dataset, and help files are available at http://www.jjwanglab.org/NRProF/.
  相似文献   

8.
We refined the motifs for carboxy-terminal protein prenylation by analysis of known substrates for farnesyltransferase (FT), geranylgeranyltransferase I (GGT1) and geranylgeranyltransferase II (GGT2). In addition to the CaaX box for the first two enzymes, we identify a preceding linker region that appears constrained in physicochemical properties, requiring small or flexible, preferably hydrophilic, amino acids. Predictors were constructed on the basis of sequence and physical property profiles, including interpositional correlations, and are available as the Prenylation Prediction Suite (PrePS, ) which also allows evaluation of evolutionary motif conservation. PrePS can predict partially overlapping substrate specificities, which is of medical importance in the case of understanding cellular action of FT inhibitors as anticancer and anti-parasite agents.  相似文献   

9.
Zhang H 《Proteins》1999,34(4):464-471
A new Hybrid Monte Carlo (HMC) algorithm has been developed to test protein potential functions and, ultimately, refine protein structures. The main principle of this algorithm is, in each cycle, a new trial conformation is generated by carrying out a short period of molecular dynamics (MD) iterations with a set of random parameters (including the MD time step, the number of MD steps, the MD temperature, and the seed for initial MD velocity assignment); then to accept or reject the new conformation on the basis of the Metropolis criterion. The novelty in this paper is that the potential in MD iterations is different from that in the MC step. In the former, it is a molecular mechanics potential, in the latter it is a knowledge-based potential (KBP). Directed by the KBP, the MD iteration is used to search conformational space for realistic conformations with low KBP energy. It circumvents the difficulty in using KBP functions directly in MD simulation, as KBP functions are typically incomplete, and do not always have continuous derivatives required for the calculation of the forces. The new algorithm has been tested in explorations of conformational space. In these test calculations the KBP energy was found to drop below the value for the native conformation, and the correlation between the root mean square deviation (RMSD) and the KBP energy was shown to be different from the test results in other references. At the present time, the algorithm is useful for testing new KBP functions. Furthermore, if a KBP function can be found for which the native conformation has the lowest energy and the energy/RMSD correlation is good, then this new algorithm also will be a tool for refinement of the theory-based structural models.  相似文献   

10.
I-TASSER server for protein 3D structure prediction   总被引:5,自引:0,他引:5  

Background  

Prediction of 3-dimensional protein structures from amino acid sequences represents one of the most important problems in computational structural biology. The community-wide Critical Assessment of Structure Prediction (CASP) experiments have been designed to obtain an objective assessment of the state-of-the-art of the field, where I-TASSER was ranked as the best method in the server section of the recent 7th CASP experiment. Our laboratory has since then received numerous requests about the public availability of the I-TASSER algorithm and the usage of the I-TASSER predictions.  相似文献   

11.

Background  

Since many of the new protein structures delivered by high-throughput processes do not have any known function, there is a need for structure-based prediction of protein function. Protein 3D structures can be clustered according to their fold or secondary structures to produce classes of some functional significance. A recent alternative has been to detect specific 3D motifs which are often associated to active sites. Unfortunately, there are very few known 3D motifs, which are usually the result of a manual process, compared to the number of sequential motifs already known. In this paper, we report a method to automatically generate 3D motifs of protein structure binding sites based on consensus atom positions and evaluate it on a set of adenine based ligands.  相似文献   

12.
In a cell, it has been estimated that each protein on average interacts with roughly 10 others, resulting in tens of thousands of proteins known or suspected to have interaction partners; of these, only a tiny fraction have solved protein structures. To partially address this problem, we have developed M-TASSER, a hierarchical method to predict protein quaternary structure from sequence that involves template identification by multimeric threading, followed by multimer model assembly and refinement. The final models are selected by structure clustering. M-TASSER has been tested on a benchmark set comprising 241 dimers having templates with weak sequence similarity and 246 without multimeric templates in the dimer library. Of the total of 207 targets predicted to interact as dimers, 165 (80%) were correctly assigned as interacting with a true positive rate of 68% and a false positive rate of 17%. The initial best template structures have an average root mean-square deviation to native of 5.3, 6.7, and 7.4 Å for the monomer, interface, and dimer structures. The final model shows on average a root mean-square deviation improvement of 1.3, 1.3, and 1.5 Å over the initial template structure for the monomer, interface, and dimer structures, with refinement evident for 87% of the cases. Thus, we have developed a promising approach to predict full-length quaternary structure for proteins that have weak sequence similarity to proteins of solved quaternary structure.  相似文献   

13.

Background  

Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method. However, integrated systems usually do not provide mechanisms to generate customized databases to predict particular protein functions. Here, we describe a tool termed PIPA (Pipeline for Protein Annotation) that has these capabilities.  相似文献   

14.
15.
16.
Protein docking algorithms can be used to study the driving forces and reaction mechanisms of docking processes. They are also able to speed up the lengthy process of experimental structure elucidation of protein complexes by proposing potential structures. In this paper, we are discussing a variant of the protein-protein docking problem, where the input consists of the tertiary structures of proteins A and B plus an unassigned one-dimensional 1H-NMR spectrum of the complex AB. We present a new scoring function for evaluating and ranking potential complex structures produced by a docking algorithm. The scoring function computes a `theoretical' 1H-NMR spectrum for each tentative complex structure and subtracts the calculated spectrum from the experimental one. The absolute areas of the difference spectra are then used to rank the potential complex structures. In contrast to formerly published approaches (e.g. [Morelli et al. (2000) Biochemistry, 39, 2530–2537]) we do not use distance constraints (intermolecular NOE constraints). We have tested the approach with four protein complexes whose three-dimensional structures are stored in the PDB data bank [Bernstein et al. (1977)] and whose 1H-NMR shift assignments are available from the BMRB database. The best result was obtained for an example, where all standard scoring functions failed completely. Here, our new scoring function achieved an almost perfect separation between good approximations of the true complex structure and false positives.  相似文献   

17.
The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access.  相似文献   

18.
There have been steady improvements in protein structure prediction during the past 2 decades. However, current methods are still far from consistently predicting structural models accurately with computing power accessible to common users. Toward achieving more accurate and efficient structure prediction, we developed a number of novel methods and integrated them into a software package, MUFOLD. First, a systematic protocol was developed to identify useful templates and fragments from Protein Data Bank for a given target protein. Then, an efficient process was applied for iterative coarse‐grain model generation and evaluation at the Cα or backbone level. In this process, we construct models using interresidue spatial restraints derived from alignments by multidimensional scaling, evaluate and select models through clustering and static scoring functions, and iteratively improve the selected models by integrating spatial restraints and previous models. Finally, the full‐atom models were evaluated using molecular dynamics simulations based on structural changes under simulated heating. We have continuously improved the performance of MUFOLD by using a benchmark of 200 proteins from the Astral database, where no template with >25% sequence identity to any target protein is included. The average root‐mean‐square deviation of the best models from the native structures is 4.28 Å, which shows significant and systematic improvement over our previous methods. The computing time of MUFOLD is much shorter than many other tools, such as Rosetta. MUFOLD demonstrated some success in the 2008 community‐wide experiment for protein structure prediction CASP8. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

19.

Background

Proteins play fundamental and crucial roles in nearly all biological processes, such as, enzymatic catalysis, signaling transduction, DNA and RNA synthesis, and embryonic development. It has been a long-standing goal in molecular biology to predict the tertiary structure of a protein from its primary amino acid sequence. From visual comparison, it was found that a 2D triangular lattice model can give a better structure modeling and prediction for proteins with short primary amino acid sequences.

Methods

This paper proposes a hybrid of hill-climbing and genetic algorithm (HHGA) based on elite-based reproduction strategy for protein structure prediction on the 2D triangular lattice.

Results

The simulation results show that the proposed HHGA can successfully deal with the protein structure prediction problems. Specifically, HHGA significantly outperforms conventional genetic algorithms and is comparable to the state-of-the-art method in terms of free energy.

Conclusions

Thanks to the enhancement of local search on the global search, the proposed HHGA achieves promising results on the 2D triangular protein structure prediction problem. The satisfactory simulation results demonstrate the effectiveness of the proposed HHGA and the utility of the 2D triangular lattice model for protein structure prediction.
  相似文献   

20.
The problem of protein structure prediction in the hydrophobic-polar (HP) lattice model is the prediction of protein tertiary structure. This problem is usually referred to as the protein folding problem. This paper presents a method for the application of an enhanced hybrid search algorithm to the problem of protein folding prediction, using the three dimensional (3D) HP lattice model. The enhanced hybrid search algorithm is a combination of the particle swarm optimizer (PSO) and tabu search (TS) algorithms. Since the PSO algorithm entraps local minimum in later evolution extremely easily, we combined PSO with the TS algorithm, which has properties of global optimization. Since the technologies of crossover and mutation are applied many times to PSO and TS algorithms, so enhanced hybrid search algorithm is called the MCMPSO-TS (multiple crossover and mutation PSO-TS) algorithm. Experimental results show that the MCMPSO-TS algorithm can find the best solutions so far for the listed benchmarks, which will help comparison with any future paper approach. Moreover, real protein sequences and Fibonacci sequences are verified in the 3D HP lattice model for the first time. Compared with the previous evolutionary algorithms, the new hybrid search algorithm is novel, and can be used effectively to predict 3D protein folding structure. With continuous development and changes in amino acids sequences, the new algorithm will also make a contribution to the study of new protein sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号