首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present an RNA-As-Graphs (RAG) based inverse folding algorithm, RAG-IF, to design novel RNA sequences that fold onto target tree graph topologies. The algorithm can be used to enhance our recently reported computational design pipeline (Jain et al., NAR 2018). The RAG approach represents RNA secondary structures as tree and dual graphs, where RNA loops and helices are coarse-grained as vertices and edges, opening the usage of graph theory methods to study, predict, and design RNA structures. Our recently developed computational pipeline for design utilizes graph partitioning (RAG-3D) and atomic fragment assembly (F-RAG) to design sequences to fold onto RNA-like tree graph topologies; the atomic fragments are taken from existing RNA structures that correspond to tree subgraphs. Because F-RAG may not produce the target folds for all designs, automated mutations by RAG-IF algorithm enhance the candidate pool markedly. The crucial residues for mutation are identified by differences between the predicted and the target topology. A genetic algorithm then mutates the selected residues, and the successful sequences are optimized to retain only the minimal or essential mutations. Here we evaluate RAG-IF for 6 RNA-like topologies and generate a large pool of successful candidate sequences with a variety of minimal mutations. We find that RAG-IF adds robustness and efficiency to our RNA design pipeline, making inverse folding motivated by graph topology rather than secondary structure more productive.  相似文献   

2.
We present a new procedure for assessing the statistical significance of the most likely unrooted dichotomous topology inferrable from four DNA sequences. The procedure calculates directly a P-value for the support given to this topology by the informative sites congruent with it, assuming the most likely star topology as the null hypothesis. Informative sites are crucial in the determination of the maximum likelihood dichotomous topology and are therefore an obvious target for a statistical test of phylogenies. Our P-value is the probability of producing through parallel substitutions on the branches of the star topology at least as much support as that given to the maximum likelihood dichotomous topology by the aforementioned informative sites, for any of the three possible dichotomous topologies. The degree of statistical significance is simply the complement of this P-value. Ours is therefore an a posteriori testing approach, in which no dichotomous topology is specified in advance. We implement the test for the case in which all sites behave identically and the substitution model has a single parameter. Under these conditions, the P-value can be easily calculated on the basis of the probabilities of change on the branches of the most likely star topology, because under these assumptions, each site can become informative independently from every other site; accordingly, the total number of informative sites of each kind is binomially distributed. We explore the test's type I error by applying it to data produced in star topologies having all branches equally long, or having two short and two long branches, and various degrees of homoplasy. The test is conservative but we demonstrate, by means of a discreteness correction and progressively assumption-free calculations of the P-values, that (1) the conservativeness is mostly due to the discrete nature of informative sites and (2) the P-values calculated empirically are moreover mostly quite accurate in absolute terms. Applying the test to data produced in dichotomous topologies with increasing internal branch length shows that, despite the test's "conservativeness," its power is much higher than that of the bootstrap, especially when the relevant informative sites are few.  相似文献   

3.
MOTIVATION: The protein side-chain conformation problem is a central problem in proteomics with wide applications in protein structure prediction and design. Computational complexity results show that the problem is hard to solve. Yet, instances from realistic applications are large and demand fast and reliable algorithms. RESULTS: We propose a new global optimization algorithm, which for the first time integrates residue reduction and rotamer reduction techniques previously developed for the protein side-chain conformation problem. We show that the proposed approach simplifies dramatically the topology of the underlining residue graph. Computations show that our algorithm solves problems using only 1-10% of the time required by the mixed-integer linear programming approach available in the literature. In addition, on a set of hard side-chain conformation problems, our algorithm runs 2-78 times faster than SCWRL 3.0, which is widely used for solving these problems. AVAILABILITY: The implementation is available as an online server at http://eudoxus.scs.uiuc.edu/r3.html  相似文献   

4.
5.
RNA molecules are important cellular components involved in many fundamental biological processes. Understanding the mechanisms behind their functions requires knowledge of their tertiary structures. Though computational RNA folding approaches exist, they often require manual manipulation and expert intuition; predicting global long-range tertiary contacts remains challenging. Here we develop a computational approach and associated program module (RNAJAG) to predict helical arrangements/topologies in RNA junctions. Our method has two components: junction topology prediction and graph modeling. First, junction topologies are determined by a data mining approach from a given secondary structure of the target RNAs; second, the predicted topology is used to construct a tree graph consistent with geometric preferences analyzed from solved RNAs. The predicted graphs, which model the helical arrangements of RNA junctions for a large set of 200 junctions using a cross validation procedure, yield fairly good representations compared to the helical configurations in native RNAs, and can be further used to develop all-atom models as we show for two examples. Because junctions are among the most complex structural elements in RNA, this work advances folding structure prediction methods of large RNAs. The RNAJAG module is available to academic users upon request.  相似文献   

6.
Characterizing enzyme sequences and identifying their active sites is a very important task. The current experimental methods are too expensive and labor intensive to handle the rapidly accumulating protein sequences and structure data. Thus accurate, high-throughput in silico methods for identifying catalytic residues and enzyme function prediction are much needed. In this paper, we propose a novel sequence-based catalytic domain prediction method using a sequence clustering and an information-theoretic approaches. The first step is to perform the sequence clustering analysis of enzyme sequences from the same functional category (those with the same EC label). The clustering analysis is used to handle the problem of widely varying sequence similarity levels in enzyme sequences. The clustering analysis constructs a sequence graph where nodes are enzyme sequences and edges are a pair of sequences with a certain degree of sequence similarity, and uses graph properties, such as biconnected components and articulation points, to generate sequence segments common to the enzyme sequences. Then amino acid subsequences in the common shared regions are aligned and then an information theoretic approach called aggregated column related scoring scheme is performed to highlight potential active sites in enzyme sequences. The aggregated information content scoring scheme is shown to be effective to highlight residues of active sites effectively. The proposed method of combining the clustering and the aggregated information content scoring methods was successful in highlighting known catalytic sites in enzymes of Escherichia coli K12 in terms of the Catalytic Site Atlas database. Our method is shown to be not only accurate in predicting potential active sites in the enzyme sequences but also computationally efficient since the clustering approach utilizes two graph properties that can be computed in linear to the number of edges in the sequence graph and computation of mutual information does not require much time. We believe that the proposed method can be useful for identifying active sites of enzyme sequences from many genome projects.  相似文献   

7.
The Server for Quick Alignment Reliability Evaluation (SQUARE) is a Web-based version of the method we developed to predict regions of reliably aligned residues in sequence alignments. Given an alignment between a query sequence and a sequence of known structure, SQUARE is able to predict which residues are reliably aligned. The server accesses a database of profiles of sequences of known three-dimensional structures in order to calculate the scores for each residue in the alignment. SQUARE produces a graphical output of the residue profile-derived alignment scores along with an indication of the reliability of the alignment. In addition, the scores can be compared against template secondary structure, conserved residues and important sites.  相似文献   

8.
MOTIVATION: Protein-protein docking algorithms typically generate large numbers of possible complex structures with only a few of them resembling the native structure. Recently (Duan et al., Protein Sci, 14:316-218, 2005), it was observed that the surface density of conserved residue positions is high at the interface regions of interacting protein surfaces, except for antibody-antigen complexes, where a lesser number of conserved positions than average is observed at the interface regions. Using this observation, we identified putative interacting regions on the surface of interacting partners and significantly improved docking results by assigning top ranks to near-native complex structures. In this paper, we combine the residue conservation information with a widely used shape complementarity algorithm to generate candidate complex structures with a higher percentage of near-native structures (hits). What is new in this work is that the conservation information is used early in the generation stage and not only in the ranking stage of the docking algorithm. This results in a significantly larger number of generated hits and an improved predictive ability in identifying the native structure of protein-protein complexes. RESULTS: We report on results from 48 well-characterized protein complexes, which have enough residue conservation information from the same 59 benchmark complexes used in our previous work. We compute conservation indices of residue positions on the surfaces of interacting proteins using available homologous sequences from UNIPROT and calculate the solvent accessible surface area. We combine this information with shape-complementarity scores to generate candidate protein-protein complex structures. When compared with pure shape-complementarity algorithms, performed by FTDock, our method results in significantly more hits, with the improvement being over 100% in many instances. We demonstrate that residue conservation information is useful not only in refinement and scoring of docking solutions, but also helpful in enrichment of near-native-structures during the generation of candidate geometries of complex structures.  相似文献   

9.

Background  

Protein topology representations such as residue contact maps are an important intermediate step towards ab initio prediction of protein structure. Although improvements have occurred over the last years, the problem of accurately predicting residue contact maps from primary sequences is still largely unsolved. Among the reasons for this are the unbalanced nature of the problem (with far fewer examples of contacts than non-contacts), the formidable challenge of capturing long-range interactions in the maps, the intrinsic difficulty of mapping one-dimensional input sequences into two-dimensional output maps.  相似文献   

10.
Homology-derived secondary structure of proteins (HSSP) is a well-known database of multiple sequence alignments (MSAs) which merges information of protein sequences and their three-dimensional structures. It is available for all proteins whose structure is deposited in the PDB. It is also used by STING and (Java)Protein Dossier to calculate and present relative entropy as a measure of the degree of conservation for each residue of proteins whose structure has been solved and deposited in the PDB. However, if the STING and (Java)Protein Dossier are to provide support for analysis of protein structures modeled in computers or being experimentally solved but not yet deposited in the PDB, then we need a new method for building alignments having a flavor of HSSP alignments (myMSAr). The present study describes a new method and its corresponding databank (SH2QS--database of sequences homologue to the query [structure-having] sequence). Our main interest in making myMSAr was to measure the degree of residue conservation for a given query sequence, regardless of whether it has a corresponding structure deposited in the PDB. In this study, we compare the measurement of residue conservation provided by corresponding alignments produced by HSSP and SH2QS. As a case study, we also present two biologically relevant examples, the first one highlighting the equivalence of analysis of the degree of residue conservation by using HSSP or SH2QS alignments, and the second one presenting the degree of residue conservation for a structure modeled in a computer, which , as a consequence, does not have an alignment reported by HSSP.  相似文献   

11.
S Waner  Y H Wu 《Bio Systems》1988,21(2):115-124
We propose an automata-theoretical framework for structured hierarchical control, in terms of rules and meta-rules, for sequences of moves on a graph. This leads to a notion of a "universal" hierarchically structured automaton mu which can move on a given graph in such a way as to emulate any automaton which moves on that graph in response to inputs. This emulation is achieved via a mapping of the inputs in the given automaton to those of mu, and we think of such a mapping as an encoding of the given automaton. We see in several examples that efficient encodings of graph-search algorithms correspond to their natural hierarchical structure (in terms of rules and meta-rules), and this leads one to a precise notion of the "depth" of an automaton which moves on a given graph. By way of application, we discuss a proposed structure of a series of stochastic neural networks which can learn, by example, to encode a given sequence of moves on a graph, so that the encoding obtained is structurally the "natural" one for the given sequence of moves. Thus, such a learning system would perform both structural pattern recognition (in terms of "patterns" of moves), and encoding based on a desired outcome.  相似文献   

12.
Drug resistance to HIV-1 protease involves the accumulation of multiple mutations in the protein. We investigate the role of these mutations by using molecular dynamics simulations that exploit the influence of the native-state topology in the folding process. Our calculations show that sites contributing to phenotypic resistance of FDA-approved drugs are among the most sensitive positions for the stability of partially folded states and should play a relevant role in the folding process. Furthermore, associations between amino acid sites mutating under drug treatment are shown to be statistically correlated. The striking correlation between clinical data and our calculations suggest a novel approach to the design of drugs tailored to bind regions crucial not only for protein function, but for folding as well.  相似文献   

13.
In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with minimum energy. We use edge-weighted connectivity graph for ranking the residue sites with reduced amino acid alphabet and then use continuous optimization to obtain the energy-minimizing sequences. Our methods enable the computation of a lower bound as well as a tight upper bound for the energy of a given conformation. We validate our results by using three different inter-residue energy matrices for five proteins from protein data bank (PDB), and by comparing our energy-minimizing sequences with 80 million diverse sequences that are generated based on different considerations in each case. When we submitted some of our chosen energy-minimizing sequences to Basic Local Alignment Search Tool (BLAST), we obtained some sequences from non-redundant protein sequence database that are similar to ours with an E-value of the order of 10-7. In summary, we conclude that proteins show a trend towards minimizing energy in the sequence space but do not seem to adopt the global energy-minimizing sequence. The reason for this could be either that the existing energy matrices are not able to accurately represent the inter-residue interactions in the context of the protein environment or that Nature does not push the optimization in the sequence space, once it is able to perform the function.  相似文献   

14.
Plant defence signalling response against various pathogens, including viruses, is a complex phenomenon. In resistant interaction a plant cell perceives the pathogen signal, transduces it within the cell and performs a reprogramming of the cell metabolism leading to the pathogen replication arrest. This work focuses on signalling pathways crucial for the plant defence response, i.e., the salicylic acid, jasmonic acid and ethylene signal transduction pathways, in the Arabidopsis thaliana model plant. The initial signalling network topology was constructed manually by defining the representation formalism, encoding the information from public databases and literature, and composing a pathway diagram. The manually constructed network structure consists of 175 components and 387 reactions. In order to complement the network topology with possibly missing relations, a new approach to automated information extraction from biological literature was developed. This approach, named Bio3graph, allows for automated extraction of biological relations from the literature, resulting in a set of (component1, reaction, component2) triplets and composing a graph structure which can be visualised, compared to the manually constructed topology and examined by the experts. Using a plant defence response vocabulary of components and reaction types, Bio3graph was applied to a set of 9,586 relevant full text articles, resulting in 137 newly detected reactions between the components. Finally, the manually constructed topology and the new reactions were merged to form a network structure consisting of 175 components and 524 reactions. The resulting pathway diagram of plant defence signalling represents a valuable source for further computational modelling and interpretation of omics data. The developed Bio3graph approach, implemented as an executable language processing and graph visualisation workflow, is publically available at http://ropot.ijs.si/bio3graph/and can be utilised for modelling other biological systems, given that an adequate vocabulary is provided.  相似文献   

15.
16.
We have investigated the transmembrane topology of the amino-terminal domain of the alpha subunit of the mouse muscle nicotinic acetylcholine receptor synthesized in vitro and in vivo. Using oligonucleotide-directed mutagenesis we introduced new glycosylation consensus sequences at alpha 154 and at alpha 200. For each novel site, additional constructs were made in which the original site at alpha N141 was eliminated. Glycosylation at the new sites, as exhibited in a rabbit reticulocyte cell-free translation system supplemented with canine pancreatic microsomes and in a transient transfection system with COS cells, was taken as evidence of the transmembrane translocation of the new site. Each of the new sites was glycosylated in both systems. In separate experiments we found that an alpha subunit fragment terminating at alpha M207 could be extracted from microsomal membranes with sodium carbonate after in vitro translation, indicating that this fragment is not an integral membrane protein. Our results, taken together with previous experiments, indicate that the amino terminus of the alpha subunit up to at least residue alpha 207 is translocated across the membrane of the endoplasmic reticulum. This topology probably represents the orientation of the amino terminus of the alpha subunit in the assembled receptor.  相似文献   

17.
BackgroundAccumulated evidence indicates that bacterial ribosome employs allostery throughout its structure for protein synthesis. The nature of the allosteric communication between remote functional sites remains unclear, but the contact topology and dynamics of residues may play role in transmission of a perturbation to distant sites.Methods/resultsWe employ two computationally efficient approaches – graph and elastic network modeling to gain insights about the allosteric communication in ribosome. Using graph representation of the structure, we perform k-shortest pathways analysis between peptidyl transferase center-ribosomal tunnel, decoding center-peptidyl transferase center - previously reported functional sites having allosteric communication. Detailed analysis on intact structures points to common and alternative shortest pathways preferred by different states of translation. All shortest pathways capture drug target sites and allosterically important regions. Elastic network model further reveals that residues along all pathways have the ability of quickly establishing pair-wise communication and to help the propagation of a perturbation in long-ranges during functional motions of the complex.ConclusionsContact topology and inherent dynamics of ribosome configure potential communication pathways between functional sites in different translation states. Inter-subunit bridges B2a, B3 and P-tRNA come forward for their high potential in assisting allostery during translation. Especially B3 emerges as a potential druggable site.General significanceThis study indicates that the ribosome topology forms a basis for allosteric communication, which can be disrupted by novel drugs to kill drug-resistant bacteria. Our computationally efficient approach not only overlaps with experimental evidence on allosteric regulation in ribosome but also proposes new druggable sites.  相似文献   

18.
The traditional approach of using homologous sequences to elucidate the role of specific amino acid residues in protein structure and function becomes more meaningful as the number of differences is minimized, with the limit being alteration of a single residue. For small proteins in solution, NMR spectroscopy offers a means of obtaining detailed information about each residue and its response to a given change in the protein sequence. Extraction of this information has been aided by recent progress in spectrometer technology (higher magnetic fields, more sensitive signal detection, more sophisticated computers) and experimental strategies (new NMR pulse sequences including multiple-quantum and two-dimensional NMR methods). The set of avian ovomucoid third domains, which consists of the third domain proper plus a short leader (connecting peptide) and has a maximum of 56 amino acid residues, offers an attractive system for developing experimental methods for investigating sequence-structure and structure-function relationships in proteins. Our NMR results provide examples of sequence effects on pKa' values, average conformation, and internal motion of amino acid side chains.  相似文献   

19.
Integral membrane proteins usually have a predominantly alpha-helical secondary structure in which transmembrane segments are connected by membrane-extrinsic loops. Although a number of membrane protein structures have been reported in recent years, in most cases transmembrane topologies are initially predicted using a variety of theoretical techniques, including hydropathy analyses and the "positive inside" rule. We have explored the use of plots of the distribution of sequence similarity within families of membrane proteins comprising homeomorphic domains as a new method for the prediction/verification of the orientation of transmembrane topology models within certain families of multimeric respiratory chain enzymes. Within such proteins, analyses of sequence similarity can: i) identify heme and/or quinol binding sites; ii) identify potential electron-transfer conduits to/from prosthetic groups; and iii) locate regions defining potential subunit-subunit interactions. We mined emerging bioinformatic data for sequences of 11 families of membrane-intrinsic proteins that are part of multimeric respiratory chain complexes that also have membrane-extrinsic subunits. The sequences of each family were then aligned and the resultant alignments converted into a graphical format recording an empirical measure of the sequence similarity plotted versus residue position. In each case, this plot was compared to the predicted transmembrane topology. With one exception, there is a strong correlation between the existence  相似文献   

20.
付新  徐振源 《生物信息学》2007,5(3):113-116
利用一种新的基于图论理论的DNA序列(片段)分析的方法,即通过复杂网络研究生物体的拓扑结构,主要通过测量聚类系数(集团系数)构建网络的拓扑结构。依据DNA序列的前缀、后缀关联性质构造了所选取DNA序列(片段)的相关网络,发现该网络分布满足幂率特征,有较大的聚类系数。结果表明构建得到的网络同时满足小世界网络和无尺度网络的特征,证明DNA序列不全是随机的序列,而是有随机扰动的确定结构的序列。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号