首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 28 毫秒
1.
Zheng S  Robertson TA  Varani G 《The FEBS journal》2007,274(24):6378-6391
RNA-protein interactions are fundamental to gene expression. Thus, the molecular basis for the sequence dependence of protein-RNA recognition has been extensively studied experimentally. However, there have been very few computational studies of this problem, and no sustained attempt has been made towards using computational methods to predict or alter the sequence-specificity of these proteins. In the present study, we provide a distance-dependent statistical potential function derived from our previous work on protein-DNA interactions. This potential function discriminates native structures from decoys, successfully predicts the native sequences recognized by sequence-specific RNA-binding proteins, and recapitulates experimentally determined relative changes in binding energy due to mutations of individual amino acids at protein-RNA interfaces. Thus, this work demonstrates that statistical models allow the quantitative analysis of protein-RNA recognition based on their structure and can be applied to modeling protein-RNA interfaces for prediction and design purposes.  相似文献   

2.
3.
Prediction of RNA binding sites in proteins from amino acid sequence   总被引:3,自引:0,他引:3  
RNA-protein interactions are vitally important in a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed a computational tool for predicting which amino acids of an RNA binding protein participate in RNA-protein interactions, using only the protein sequence as input. RNABindR was developed using machine learning on a validated nonredundant data set of interfaces from known RNA-protein complexes in the Protein Data Bank. It generates a classifier that captures primary sequence signals sufficient for predicting which amino acids in a given protein are located in the RNA-protein interface. In leave-one-out cross-validation experiments, RNABindR identifies interface residues with >85% overall accuracy. It can be calibrated by the user to obtain either high specificity or high sensitivity for interface residues. RNABindR, implementing a Naive Bayes classifier, performs as well as a more complex neural network classifier (to our knowledge, the only previously published sequence-based method for RNA binding site prediction) and offers the advantages of speed, simplicity and interpretability of results. RNABindR predictions on the human telomerase protein hTERT are in good agreement with experimental data. The availability of computational tools for predicting which residues in an RNA binding protein are likely to contact RNA should facilitate design of experiments to directly test RNA binding function and contribute to our understanding of the diversity, mechanisms, and regulation of RNA-protein complexes in biological systems. (RNABindR is available as a Web tool from http://bindr.gdcb.iastate.edu.).  相似文献   

4.
There are hundreds of RNA binding proteins in the human genome alone and their interactions with messenger and other RNAs in a cell regulate every step in an RNA's life cycle. To understand this interplay of proteins and RNA it is important to be able to know which protein binds which RNA how strongly and where. Here, we introduce RBPBind, a web-based tool for the quantitative prediction of the interaction of single-stranded RNA binding proteins with target RNAs that fully takes into account the effect of RNA secondary structure on binding affinity. Given a user-specified RNA and a protein selected from a set of several RNA-binding proteins, RBPBind computes their binding curve and effective binding constant. The server also computes the probability that, at a given protein concentration, a protein molecule will bind to any particular nucleotide along the RNA. The sequence specificity of the protein-RNA interaction is parameterized from public RNAcompete experiments and integrated into the recursions of the Vienna RNA package to simultaneously take into account protein binding and RNA secondary structure. We validate our approach by comparison to experimentally determined binding affinities of the HuR protein for several RNAs of different sequence contexts from the literature, showing that integration of raw sequence affinities into RNA secondary structure prediction significantly improves the agreement between computationally predicted and experimentally measured binding affinities. Our resource thus provides a quick and easy way to obtain reliable predicted binding affinities and locations for single-stranded RNA binding proteins based on RNA sequence alone.  相似文献   

5.
Ellis JJ  Jones S 《Proteins》2008,70(4):1518-1526
Many protein-RNA recognition events are known to exhibit conformational changes from qualitative observations of individual complexes. However, a quantitative estimation of conformational changes is required if protein-RNA docking and template-based methods for RNA binding site prediction are to be developed. This study presents the first quantitative evaluation of conformational changes that occur when proteins bind RNA. The analysis of twelve RNA-binding proteins in the bound and unbound states using error-scaled difference distance matrices is presented. The binding site residues are mapped to each structure, and the conformational changes that affect these residues are evaluated. Of the twelve proteins four exhibit greater movements in nonbinding site residues, and a further four show the greatest movements in binding site residues. The remaining four proteins display no significant conformational change. When interface residues are found to be in conformationally variable regions of the protein they are typically seen to move less than 2 A between the bound and unbound conformations. The current data indicate that conformational changes in the binding site residues of RNA binding proteins may not be as significant as previously suggested, but a larger data set is required before wider conclusions may be drawn. The implications of the observed conformational changes for protein function prediction are discussed.  相似文献   

6.
Silverman  Ian M  Li  Fan  Alexander  Anissa  Goff  Loyal  Trapnell  Cole  Rinn  John L  Gregory  Brian D 《Genome biology》2014,15(1):1-16

Background

Sequence specific RNA binding proteins are important regulators of gene expression. Several related crosslinking-based, high-throughput sequencing methods, including PAR-CLIP, have recently been developed to determine direct binding sites of global protein-RNA interactions. However, no studies have quantitatively addressed the contribution of background binding to datasets produced by these methods.

Results

We measured non-specific RNA background in PAR-CLIP data, demonstrating that covalently crosslinked background binding is common, reproducible and apparently universal among laboratories. We show that quantitative determination of background is essential for identifying targets of most RNA-binding proteins and can substantially improve motif analysis. We also demonstrate that by applying background correction to an RNA binding protein of unknown binding specificity, Caprin1, we can identify a previously unrecognized RNA recognition element not otherwise apparent in a PAR-CLIP study.

Conclusions

Empirical background measurements of global RNA-protein crosslinking are a necessary addendum to other experimental controls, such as performing replicates, because covalently crosslinked background signals are reproducible and otherwise unavoidable. Recognizing and quantifying the contribution of background extends the utility of PAR-CLIP and can improve mechanistic understanding of protein-RNA specificity, protein-RNA affinity and protein-RNA association dynamics.  相似文献   

7.
Zhao H  Yang Y  Zhou Y 《Nucleic acids research》2011,39(8):3017-3025
Mechanistic understanding of many key cellular processes often involves identification of RNA binding proteins (RBPs) and RNA binding sites in two separate steps. Here, they are predicted simultaneously by structural alignment to known protein-RNA complex structures followed by binding assessment with a DFIRE-based statistical energy function. This method achieves 98% accuracy and 91% precision for predicting RBPs and 93% accuracy and 78% precision for predicting RNA-binding amino-acid residues for a large benchmark of 212 RNA binding and 6761 non-RNA binding domains (leave-one-out cross-validation). Additional tests revealed that the method makes no false positive prediction from 311 DNA binding domains but correctly detects six domains binding with both DNA and RNA. In addition, it correctly identified 31 of 75 unbound RNA-binding domains with 92% accuracy and 65% precision for predicted binding residues and achieved 86% success rate in its application to SCOP RNA binding domain superfamily (Structural Classification Of Proteins). It further predicts 25 targets as RBPs in 2076 structural genomics targets: 20 of 25 predicted ones (80%) are putatively RNA binding. The superior performance over existing methods indicates the importance of dividing structures into domains, using a Z-score to measure relative structural similarity, and a statistical energy function to measure protein-RNA binding affinity.  相似文献   

8.
Understanding the molecular mechanism of protein-RNA recognition and complex formation is a major challenge in structural biology. Unfortunately, the experimental determination of protein-RNA complexes by X-ray crystallography and nuclear magnetic resonance spectroscopy (NMR) is tedious and difficult. Alternatively, protein-RNA interactions can be predicted by computational methods. Although less accurate than experimental observations, computational predictions can be sufficiently accurate to prompt functional hypotheses and guide experiments, e.g. to identify individual amino acid or nucleotide residues. In this article we review 10 methods for predicting protein-RNA interactions, seven of which predict RNA-binding sites from protein sequences and three from structures. We also developed a meta-predictor that uses the output of top three sequence-based primary predictors to calculate a consensus prediction, which outperforms all the primary predictors. In order to fully cover the software for predicting protein-RNA interactions, we also describe five methods for protein-RNA docking. The article highlights the strengths and shortcomings of existing methods for the prediction of protein-RNA interactions and provides suggestions for their further development.  相似文献   

9.
ABSTRACT: BACKGROUND: RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. RESULTS: We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naive Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. CONCLUSIONS: Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.  相似文献   

10.
RNA-binding proteins (RBPs) are proteins that bind to the RNA and participate in forming ribonucleoprotein complexes. They have crucial roles in various biological processes such as RNA splicing, editing, transport, maintenance, degradation, intracellular localization and translation. The RBPs bind RNA with different RNA-sequence specificities and affinities, thus, identification of protein binding sites on RNAs (R-PBSs) will deeper our understanding of RNA-protein interactions. Currently, high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP, also known as CLIP-Seq) is one of the most powerful methods to map RNA-protein binding sites or RNA modification sites. However, this method is only used for identification of single known RBPs and antibodies for RBPs are required. Here we developed a novel method, called capture of protein binding sites on RNAs (RPBS-Cap) to identify genome-wide protein binding sites on RNAs without using antibodies. Double click strategy is used for the RPBS-Cap assay. Proteins and RNAs are UV-crosslinked in vivo first, then the proteins are crosslinked to the magnetic beads. The RNA elements associated with proteins are captured, reverse transcribed and sequenced. Our approach has potential applications for studying genome-wide RNA-protein interactions.  相似文献   

11.
The molecular architecture of protein-RNA interfaces are analyzed using a non-redundant dataset of 152 protein-RNA complexes. We find that an average protein-RNA interface is smaller than an average protein-DNA interface but larger than an average protein–protein interface. Among the different classes of protein-RNA complexes, interfaces with tRNA are the largest, while the interfaces with the single-stranded RNA are the smallest. Significantly, RNA contributes more to the interface area than its partner protein. Moreover, unlike protein–protein interfaces where the side chain contributes less to the interface area compared to the main chain, the main chain and side chain contributions flipped in protein-RNA interfaces. We find that the protein surface in contact with the RNA in protein-RNA complexes is better packed than that in contact with the DNA in protein-DNA complexes, but loosely packed than that in contact with the protein in protein–protein complexes. Shape complementarity and electrostatic potential are the two major factors that determine the specificity of the protein-RNA interaction. We find that the H-bond density at the protein-RNA interfaces is similar with that of protein-DNA interfaces but higher than the protein–protein interfaces. Unlike protein-DNA interfaces where the deoxyribose has little role in intermolecular H-bonds, due to the presence of an oxygen atom at the 2′ position, the ribose in RNA plays significant role in protein-RNA H-bonds. We find that besides H-bonds, salt bridges and stacking interactions also play significant role in stabilizing protein-nucleic acids interfaces; however, their contribution at the protein–protein interfaces is insignificant.  相似文献   

12.
13.
14.
We present a high throughput, versatile approach to identify RNA-protein interactions and to determine nucleotides important for specific protein binding. In this approach, oligonucleotides are coupled to microbeads and hybridized to RNA-protein complexes. The presence or absence of RNA and/or protein fluorescence indicates the formation of an oligo-RNA-protein complex on each bead. The observed fluorescence is specific for both the hybridization and the RNA-protein interaction. We find that the method can discriminate noncomplementary and mismatch sequences. The observed fluorescence reflects the affinity and specificity of the RNA-protein interaction. In addition, the fluorescence patterns footprint the protein recognition site to determine nucleotides important for protein binding. The system was developed with the human protein U1A binding to RNAs derived from U1 snRNA but can also detect RNA-protein interactions in total RNA backgrounds. We propose that this strategy, in combination with emerging coded bead systems, can identify RNAs and RNA sequences important for interacting with RNA-binding proteins on genomic scales.  相似文献   

15.
The coat proteins of different single-strand RNA phages utilize a common structural framework to recognize different RNA targets, making them suitable models for studies of RNA-protein recognition generally, especially for the class of proteins that bind RNA on a beta-sheet surface. Here we show that structurally distinct molecules are capable of satisfying the requirements for binding to Qbeta coat protein. Although the predicted secondary structures of the RNAs differ markedly, we contend that they are approximately equivalent structurally in their complexes with coat protein. Based on our prior observations that the RNA-binding specificities of Qbeta and MS2 coat proteins can be interconverted with as few as one amino acid substitution each, and taking into account details of the structures of complexes of MS2 coat protein with wild-type and aptamer RNAs, we propose a model for the Qbeta coat protein-RNA complex.  相似文献   

16.
RNA-protein interactions   总被引:1,自引:0,他引:1  
Recent discoveries have revealed that there is a myriad of RNAs and associated RNA-binding proteins that spatially and temporally appear in the cells of all organisms. The structures of these RNA-protein complexes are providing valuable insights into the binding modes and functional implications of these interactions. Even the common RNA-binding domains (RBDs) and the double stranded RNA binding motifs (dsRBMs) have been shown to exhibit a plethora of binding modes.  相似文献   

17.
We investigate the sequence and structural properties of RNA-protein interaction sites in 211 RNA-protein chain pairs, the largest set of RNA-protein complexes analyzed to date. Statistical analysis confirms and extends earlier analyses made on smaller data sets. There are 24.6% of hydrogen bonds between RNA and protein that are nucleobase specific, indicating the importance of both nucleobase-specific and -nonspecific interactions. While there is no significant difference between RNA base frequencies in protein-binding and non-binding regions, distinct preferences for RNA bases, RNA structural states, protein residues, and protein secondary structure emerge when nucleobase-specific and -nonspecific interactions are considered separately. Guanine nucleobase and unpaired RNA structural states are significantly preferred in nucleobase-specific interactions; however, nonspecific interactions disfavor guanine, while still favoring unpaired RNA structural states. The opposite preferences of nucleobase-specific and -nonspecific interactions for guanine may explain discrepancies between earlier studies with regard to base preferences in RNA-protein interaction regions. Preferences for amino acid residues differ significantly between nucleobase-specific and -nonspecific interactions, with nonspecific interactions showing the expected bias towards positively charged residues. Irregular protein structures are strongly favored in interactions with the protein backbone, whereas there is little preference for specific protein secondary structure in either nucleobase-specific interaction or -nonspecific interaction. Overall, this study shows strong preferences for both RNA bases and RNA structural states in protein-RNA interactions, indicating their mutual importance in protein recognition.  相似文献   

18.
Phipps KR  Li H 《Proteins》2007,67(1):121-127
The crystal packing surfaces comprising protein-RNA interactions were analyzed for 50 RNA-protein crystal structures in the Protein Data Bank database. Protein-RNA crystal contacts, which represent nonspecific protein-RNA interfaces, were investigated for their amino acid propensities, hydrogen bond patterns, and backbone and side chain interactions. When compared to biologically relevant interactions, the protein-RNA crystal contacts exhibit similarities as well as differences with respect to the principles of protein-RNA interactions. Similar to what was observed at cognate protein-RNA interfaces, positively charged amino acids have high propensities at noncognate protein-RNA interfaces and preferentially form hydrogen bonds with RNA phosphate groups. In contrast, nonpolar residues are less frequently associated with noncognate interactions. These results highlight the important roles of both electrostatic and hydrogen bonding interactions, facilitated by positively charged amino acids, in mediating both specific and nonspecific protein-RNA interactions.  相似文献   

19.
Elucidation of the interaction of proteins with different molecules is of significance in the understanding of cellular processes. Computational methods have been developed for the prediction of protein-protein interactions. But insufficient attention has been paid to the prediction of protein-RNA interactions, which play central roles in regulating gene expression and certain RNA-mediated enzymatic processes. This work explored the use of a machine learning method, support vector machines (SVM), for the prediction of RNA-binding proteins directly from their primary sequence. Based on the knowledge of known RNA-binding and non-RNA-binding proteins, an SVM system was trained to recognize RNA-binding proteins. A total of 4011 RNA-binding and 9781 non-RNA-binding proteins was used to train and test the SVM classification system, and an independent set of 447 RNA-binding and 4881 non-RNA-binding proteins was used to evaluate the classification accuracy. Testing results using this independent evaluation set show a prediction accuracy of 94.1%, 79.3%, and 94.1% for rRNA-, mRNA-, and tRNA-binding proteins, and 98.7%, 96.5%, and 99.9% for non-rRNA-, non-mRNA-, and non-tRNA-binding proteins, respectively. The SVM classification system was further tested on a small class of snRNA-binding proteins with only 60 available sequences. The prediction accuracy is 40.0% and 99.9% for snRNA-binding and non-snRNA-binding proteins, indicating a need for a sufficient number of proteins to train SVM. The SVM classification systems trained in this work were added to our Web-based protein functional classification software SVMProt, at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi. Our study suggests the potential of SVM as a useful tool for facilitating the prediction of protein-RNA interactions.  相似文献   

20.
The pentatricopeptide repeat (PPR) is a helical repeat motif found in an exceptionally large family of RNA-binding proteins that functions in mitochondrial and chloroplast gene expression. PPR proteins harbor between 2 and 30 repeats and typically bind single-stranded RNA in a sequence-specific fashion. However, the basis for sequence-specific RNA recognition by PPR tracts has been unknown. We used computational methods to infer a code for nucleotide recognition involving two amino acids in each repeat, and we validated this model by recoding a PPR protein to bind novel RNA sequences in vitro. Our results show that PPR tracts bind RNA via a modular recognition mechanism that differs from previously described RNA-protein recognition modes and that underpins a natural library of specific protein/RNA partners of unprecedented size and diversity. These findings provide a significant step toward the prediction of native binding sites of the enormous number of PPR proteins found in nature. Furthermore, the extraordinary evolutionary plasticity of the PPR family suggests that the PPR scaffold will be particularly amenable to redesign for new sequence specificities and functions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号