首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 444 毫秒
1.
The current available data on protein sequences largely exceeds the experimental capabilities to annotate their function. So annotation in silico, i.e. using computational methods becomes increasingly important. This annotation is inevitably a prediction, but it can be an important starting point for further experimental studies. Here we present a method for prediction of protein functional sites, SDPsite, based on the identification of protein specificity determinants. Taking as an input a protein sequence alignment and a phylogenetic tree, the algorithm predicts conserved positions and specificity determinants, maps them onto the protein's 3D structure, and searches for clusters of the predicted positions. Comparison of the obtained predictions with experimental data and data on performance of several other methods for prediction of functional sites reveals that SDPsite agrees well with the experiment and outperforms most of the previously available methods. SDPsite is publicly available under http://bioinf.fbb.msu.ru/SDPsite.  相似文献   

2.
3.
RNA secondary structure prediction is one of the classic problems of bioinformatics. The most efficient approaches to solving this problem are based on comparative analysis. As a rule, multiple RNA sequence alignment and subsequent determination of a common secondary structure are used. A new algorithm was developed to obviate the need for preliminary multiple sequence alignment. The algorithm is based on a multilevel MEME-like iterative search for a generalized profile. The search for common blocks in RNA sequences is carried out at the first level. Then the algorithm refines the chains consisting of these blocks. Finally, the search for sets of common helices, matched with alignment blocks, is carried out. The algorithm was tested with a tRNA set containing additional junk sequences and with RFN riboswitches. The algorithm is available at http://bioinf.fbb.msu.ru/RNAAlign.  相似文献   

4.
Prediction of transmembrane (TM) segments of amino acid sequences of membrane proteins is a well-known and very important problem. The accuracy of its solution can be improved for approaches that do not use a homology search in an additional data bank. There is a lack of tested data in this area of research, because information on the structure of membrane proteins is scarce. In this work we created a test sample of structural alignments for membrane proteins. The TM segments of these proteins were mapped according to aligned 3D structures resolved for these proteins. A method for predicting TM segments in an alignment was developed on the basis of the forward-backward algorithm from the HMM theory. This method allows a user not only to predict TM segments, but also to create a probabilistic membrane profile, which can be employed in multiple alignment procedures taking the secondary structure of proteins into account. The method was implemented in a computer program available at http://bioinf.fbb.msu.ru/fwdbck/. It provides better results than the MEMSAT method, which is nearly the only tool predicting TM segments in multiple alignments, without a homology search.  相似文献   

5.
During evolution of proteins from a common ancestor, one functional property can be preserved while others can vary leading to functional diversity. A systematic study of the corresponding adaptive mutations provides a key to one of the most challenging problems of modern structural biology – understanding the impact of amino acid substitutions on protein function. The subfamily-specific positions (SSPs) are conserved within functional subfamilies but are different between them and, therefore, seem to be responsible for functional diversity in protein superfamilies. Consequently, a corresponding method to perform the bioinformatic analysis of sequence and structural data has to be implemented in the common laboratory practice to study the structure–function relationship in proteins and develop novel protein engineering strategies. This paper describes Zebra web server – a powerful remote platform that implements a novel bioinformatic analysis algorithm to study diverse protein families. It is the first application that provides specificity determinants at different levels of functional classification, therefore addressing complex functional diversity of large superfamilies. Statistical analysis is implemented to automatically select a set of highly significant SSPs to be used as hotspots for directed evolution or rational design experiments and analyzed studying the structure–function relationship. Zebra results are provided in two ways – (1) as a single all-in-one parsable text file and (2) as PyMol sessions with structural representation of SSPs. Zebra web server is available at http://biokinet.belozersky.msu.ru/zebra.  相似文献   

6.
Prediction of membrane segments in sequences of membrane proteins is well known and important problem. Accuracy of the solution of this problem by methods that don't use homology search in additional data bank can be improved. There is a lack of testing data in this area because of small amount of real structures of membrane proteins. In this work, we create a testing set of structural alignments of membrane proteins, in which positioning of the membrane segments reflects agreement of known 3D-structures of proteins in the alignment. We propose a method for predicting position of membrane segments in multiple alignment based on forward-backward algorithm from HMM theory. This method not only allows to predict positions of membrane segments but also forms probability membrane profile, which can be used in multiple alignment methods that take into account secondary structure information about sequences. Method is implemented in computer program available on the World-Wide Web site http://bioinf.fbb.msu.ru/fwdbck/. Proposed method provides results better than MEMSAT method, which is nearly only tool for prediction of membrane segments in multiple alignments without additional homology search.  相似文献   

7.
The use of antigenicity scales based on physicochemical properties and the sliding window method in combination with an averaging algorithm and subsequent search for the maximum value is the classical method for B-cell epitope prediction. However, recent studies have demonstrated that the best classical methods provide a poor correlation with experimental data. We review both classical and novel algorithms and present our own implementation of the algorithms. The AAPPred software is available at http://www.bioinf.ru/aappred/.  相似文献   

8.
EDAS, an alternatively spliced human gene database, contains data on alignment of proteins, mRNAs, and ESTs. For 8324 human genes, the database contains information on all observed exons and introns and also elementary alternatives formed therefrom. The database allows one to filter the output data by varying the cutoff threshold according to the significance level. The database is available at http://www.genebee.msu.ru/edas/.  相似文献   

9.
The RNA secondary structure prediction is a classical problem in bioinformatics. The most efficient approach to this problem is based on the idea of a comparative analysis. In this approach the algorithms utilize multiple alignment of the RNA sequences and find common RNA structure. This paper describes a new algorithm for this task. This algorithm does not require predefined multiple alignment. The main idea of the algorithm is based on MEME-like iterative searching of abstract profile on different levels. On the first level the algorithm searches the common blocks in the RNA sequences and creates chain of this blocks. On the next step the algorithm refines the chain of common blocks. On the last stage the algorithm searches sets of common helices that have consistent locations relative to common blocks. The algorithm was tested on sets of tRNA with a subset of junk sequences and on RFN riboswitches. The algorithm is implemented as a web server (http://bioinf.fbb.msu.ru/RNAAlign/).  相似文献   

10.

Background

Vitamins are typical ligands that play critical roles in various metabolic processes. The accurate identification of the vitamin-binding residues solely based on a protein sequence is of significant importance for the functional annotation of proteins, especially in the post-genomic era, when large volumes of protein sequences are accumulating quickly without being functionally annotated.

Results

In this paper, a new predictor called TargetVita is designed and implemented for predicting protein-vitamin binding residues using protein sequences. In TargetVita, features derived from the position-specific scoring matrix (PSSM), predicted protein secondary structure, and vitamin binding propensity are combined to form the original feature space; then, several feature subspaces are selected by performing different feature selection methods. Finally, based on the selected feature subspaces, heterogeneous SVMs are trained and then ensembled for performing prediction.

Conclusions

The experimental results obtained with four separate vitamin-binding benchmark datasets demonstrate that the proposed TargetVita is superior to the state-of-the-art vitamin-specific predictor, and an average improvement of 10% in terms of the Matthews correlation coefficient (MCC) was achieved over independent validation tests. The TargetVita web server and the datasets used are freely available for academic use at http://csbio.njust.edu.cn/bioinf/TargetVita or http://www.csbio.sjtu.edu.cn/bioinf/TargetVita.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-297) contains supplementary material, which is available to authorized users.  相似文献   

11.
Water molecules immobilized on a protein or DNA surface are known to play an important role in intramolecular and intermolecular interactions. Comparative analysis of related three-dimensional (3D) structures allows to predict the locations of such water molecules on the protein surface. We have developed and implemented the algorithm WLAKE detecting "conserved" water molecules, i.e. those located in almost the same positions in a set of superimposed structures of related proteins or macromolecular complexes. The problem is reduced to finding maximal cliques in a certain graph. Despite exponential algorithm complexity, the program works appropriately fast for dozens of superimposed structures. WLAKE was used to predict functionally significant water molecules in enzyme active sites (transketolases) as well as in intermolecular (ETS-DNA complexes) and intramolecular (thiol-disulfide interchange protein) interactions. The program is available online at http://monkey.belozersky.msu.ru/~evgeniy/wLake/wLake.html.  相似文献   

12.

Motivation

Genome-wide screens for structured ncRNA genes in mammals, urochordates, and nematodes have predicted thousands of putative ncRNA genes and other structured RNA motifs. A prerequisite for their functional annotation is to determine the reading direction with high precision.

Results

While folding energies of an RNA and its reverse complement are similar, the differences are sufficient at least in conjunction with substitution patterns to discriminate between structured RNAs and their complements. We present here a support vector machine that reliably classifies the reading direction of a structured RNA from a multiple sequence alignment and provides a considerable improvement in classification accuracy over previous approaches.

Software

RNAstrand is freely available as a stand-alone tool from http://www.bioinf.uni-leipzig.de/Software/RNAstrand and is also included in the latest release of RNAz, a part of the Vienna RNA Package.  相似文献   

13.
Universal ontology of catalytic sites is required to systematize enzyme catalytic sites, their evolution as well as relations between catalytic sites and protein families, organisms and chemical reactions. Here we present a classification of hydrolases catalytic sites based on hierarchical organization. The web-accessible database provides information on the catalytic sites, protein folds, EC numbers and source organisms of the enzymes and includes software allowing for analysis and visualization of the relations between them. AVAILABILITY: http://www.enzyme.chem.msu.ru/hcs/  相似文献   

14.
MOTIVATION: Recognition of functional sites remains a key event in the course of genomic DNA annotation. It is well known that a number of sites have their own specific oligonucleotide content. This pinpoints the fact that the preference of the site-specific nucleotide combinations at adjacent positions within an analyzed functional site could be informative for this site recognition. Hence, Web-available resources describing the site-specific oligonucleotide content of the functional DNA sites and applying the above approach for site recognition are needed. However, they have been poorly developed up to now. RESULTS: To describe the specific oligonucleotide content of the functional DNA sites, we introduce the oligonucleotide alphabets, out of which the frequency matrix for a given site could be constructed in addition to a traditional nucleotide frequency matrix. Thus, site recognition accuracy increases. This approach was implemented in the activated MATRIX database accumulating oligonucleotide frequency matrices of the functional DNA sites. We have demonstrated that the false-positive error of the functional site recognition decreases if the oligonucleotide frequency matrixes are added to the nucleotide frequency matrixes commonly used. AVAILABILITY: The MATRIX database is available on the Web, http://wwwmgs.bionet.nsc.ru/Dbases/MATRIX/ and the mirror site, http://www.cbil.upenn.edu/mgs/systems/c onsfreq/.  相似文献   

15.

Background

Large amounts of data are being generated by high-throughput genome sequencing methods. But the rate of the experimental functional characterization falls far behind. To fill the gap between the number of sequences and their annotations, fast and accurate automated annotation methods are required. Many methods, such as GOblet, GOFigure, and Gotcha, are designed based on the BLAST search. Unfortunately, the sequence coverage of these methods is low as they cannot detect the remote homologues. Adding to this, the lack of annotation specificity advocates the need to improve automated protein function prediction.

Results

We designed a novel automated protein functional assignment method based on the neural response algorithm, which simulates the neuronal behavior of the visual cortex in the human brain. Firstly, we predict the most similar target protein for a given query protein and thereby assign its GO term to the query sequence. When assessed on test set, our method ranked the actual leaf GO term among the top 5 probable GO terms with accuracy of 86.93%.

Conclusions

The proposed algorithm is the first instance of neural response algorithm being used in the biological domain. The use of HMM profiles along with the secondary structure information to define the neural response gives our method an edge over other available methods on annotation accuracy. Results of the 5-fold cross validation and the comparison with PFP and FFPred servers indicate the prominent performance by our method. The program, the dataset, and help files are available at http://www.jjwanglab.org/NRProF/.
  相似文献   

16.
Functional context for biological sequence is provided in the form of annotations. However, within a group of similar sequences there can be annotation heterogeneity in terms of coverage and specificity. This in turn can introduce issues regarding the interpretation of actual functional similarity and overall functional coherence of such a group. One way to mitigate such issues is through the use of visualization and statistical techniques. Therefore, in order to help interpret this annotation heterogeneity we created a web application that generates Gene Ontology annotation graphs for protein sets and their associated statistics from simple frequencies to enrichment values and Information Content based metrics. The publicly accessible website http://xldb.di.fc.ul.pt/gryfun/ currently accepts lists of UniProt accession numbers in order to create user-defined protein sets for subsequent annotation visualization and statistical assessment. GRYFUN is a freely available web application that allows GO annotation visualization of protein sets and which can be used for annotation coherence and cohesiveness analysis and annotation extension assessments within under-annotated protein sets.  相似文献   

17.
18.
We present GraphProt, a computational framework for learning sequence- and structure-binding preferences of RNA-binding proteins (RBPs) from high-throughput experimental data. We benchmark GraphProt, demonstrating that the modeled binding preferences conform to the literature, and showcase the biological relevance and two applications of GraphProt models. First, estimated binding affinities correlate with experimental measurements. Second, predicted Ago2 targets display higher levels of expression upon Ago2 knockdown, whereas control targets do not. Computational binding models, such as those provided by GraphProt, are essential for predicting RBP binding sites and affinities in all tissues. GraphProt is freely available at http://www.bioinf.uni-freiburg.de/Software/GraphProt.  相似文献   

19.
A software information system called Protein Structure Discovery was developed. The system can be used to solve a wide range of tasks in the field of computer proteomics, including prediction of function, structure, and immune properties of proteins. A special section of the system allows the evaluation of quantitative and qualitative effects of mutations on the structural and functional properties of proteins. There are 19 different programs integrated into the system, including: PDBSite, a database of protein functional sites; PDBSiteScan, a program to predict functional sites in three-dimensional structures of proteins; and a Web-ProAnalyst program to quantitatively analysis the structure-activity relationships of proteins. The Protein Structure Discovery has a Web interface and is available for users via the Internet (http://www-bionet.sscc.ru/psd/). For example, the binding sites of zinc ion and ADP showed a high stability of the method to errors in the reconstruction of spatial structures of proteins in the recognition of functional sites in model structures.  相似文献   

20.
Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号