期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using the Fast Fourier Transform to Accelerate the Computational Search for RNA Conformational Switches

Evan Senter Saad Sheikh Ivan Dotu Yann Ponty Peter Clote 《PloS one》2012,7(12)

Using complex roots of unity and the Fast Fourier Transform, we design a new thermodynamics-based algorithm, FFTbor, that computes the Boltzmann probability that secondary structures differ by base pairs from an arbitrary initial structure of a given RNA sequence. The algorithm, which runs in quartic time and quadratic space , is used to determine the correlation between kinetic folding speed and the ruggedness of the energy landscape, and to predict the location of riboswitch expression platform candidates. A web server is available at http://bioinformatics.bc.edu/clotelab/FFTbor/. 相似文献

2.

Computing folding pathways between RNA secondary structures

Ivan Dotu William A. Lorenz Pascal Van Hentenryck Peter Clote 《Nucleic acids research》2010,38(5):1711-1722

Given an RNA sequence and two designated secondary structures A, B, we describe a new algorithm that computes a nearly optimal folding pathway from A to B. The algorithm, RNAtabupath, employs a tabu semi-greedy heuristic, known to be an effective search strategy in combinatorial optimization. Folding pathways, sometimes called routes or trajectories, are computed by RNAtabupath in a fraction of the time required by the barriers program of Vienna RNA Package. We benchmark RNAtabupath with other algorithms to compute low energy folding pathways between experimentally known structures of several conformational switches. The RNApathfinder web server, source code for algorithms to compute and analyze pathways and supplementary data are available at http://bioinformatics.bc.edu/clotelab/RNApathfinder. 相似文献

3.

Complete RNA inverse folding: computational design of functional hammerhead ribozymes

Ivan Dotu Juan Antonio Garcia-Martin Betty L. Slinger Vinodh Mechery Michelle M. Meyer Peter Clote 《Nucleic acids research》2014,42(18):11752-11762

Nanotechnology and synthetic biology currently constitute one of the most innovative, interdisciplinary fields of research, poised to radically transform society in the 21st century. This paper concerns the synthetic design of ribonucleic acid molecules, using our recent algorithm, RNAiFold, which can determine all RNA sequences whose minimum free energy secondary structure is a user-specified target structure. Using RNAiFold, we design ten cis-cleaving hammerhead ribozymes, all of which are shown to be functional by a cleavage assay. We additionally use RNAiFold to design a functional cis-cleaving hammerhead as a modular unit of a synthetic larger RNA. Analysis of kinetics on this small set of hammerheads suggests that cleavage rate of computationally designed ribozymes may be correlated with positional entropy, ensemble defect, structural flexibility/rigidity and related measures. Artificial ribozymes have been designed in the past either manually or by SELEX (Systematic Evolution of Ligands by Exponential Enrichment); however, this appears to be the first purely computational design and experimental validation of novel functional ribozymes. RNAiFold is available at http://bioinformatics.bc.edu/clotelab/RNAiFold/. 相似文献

4.

RNA Thermodynamic Structural Entropy

Juan Antonio Garcia-Martin Peter Clote 《PloS one》2015,10(11)

Conformational entropy for atomic-level, three dimensional biomolecules is known experimentally to play an important role in protein-ligand discrimination, yet reliable computation of entropy remains a difficult problem. Here we describe the first two accurate and efficient algorithms to compute the conformational entropy for RNA secondary structures, with respect to the Turner energy model, where free energy parameters are determined from UV absorption experiments. An algorithm to compute the derivational entropy for RNA secondary structures had previously been introduced, using stochastic context free grammars (SCFGs). However, the numerical value of derivational entropy depends heavily on the chosen context free grammar and on the training set used to estimate rule probabilities. Using data from the Rfam database, we determine that both of our thermodynamic methods, which agree in numerical value, are substantially faster than the SCFG method. Thermodynamic structural entropy is much smaller than derivational entropy, and the correlation between length-normalized thermodynamic entropy and derivational entropy is moderately weak to poor. In applications, we plot the structural entropy as a function of temperature for known thermoswitches, such as the repression of heat shock gene expression (ROSE) element, we determine that the correlation between hammerhead ribozyme cleavage activity and total free energy is improved by including an additional free energy term arising from conformational entropy, and we plot the structural entropy of windows of the HIV-1 genome. Our software RNAentropy can compute structural entropy for any user-specified temperature, and supports both the Turner’99 and Turner’04 energy parameters. It follows that RNAentropy is state-of-the-art software to compute RNA secondary structure conformational entropy. Source code is available at https://github.com/clotelab/RNAentropy/; a full web server is available at http://bioinformatics.bc.edu/clotelab/RNAentropy, including source code and ancillary programs. 相似文献

5.

Asymptotic Number of Hairpins of Saturated RNA Secondary Structures

Peter Clote Evangelos Kranakis Danny Krizanc 《Bulletin of mathematical biology》2013,75(12):2410-2430

In the absence of chaperone molecules, RNA folding is believed to depend on the distribution of kinetic traps in the energy landscape of all secondary structures. Kinetic traps in the Nussinov energy model are precisely those secondary structures that are saturated, meaning that no base pair can be added without introducing either a pseudoknot or base triple. In this paper, we compute the asymptotic expected number of hairpins in saturated structures. For instance, if every hairpin is required to contain at least θ=3 unpaired bases and the probability that any two positions can base-pair is p=3/8, then the asymptotic number of saturated structures is 1.34685?n ^?3/2?1.62178ⁿ, and the asymptotic expected number of hairpins follows a normal distribution with mean $0.06695640 \cdot n + 0.01909350 \cdot\sqrt{n} \cdot\mathcal{N}$ . Similar results are given for values θ=1,3, and p=1,1/2,3/8; for instance, when θ=1 and p=1, the asymptotic expected number of hairpins in saturated secondary structures is 0.123194?n, a value greater than the asymptotic expected number 0.105573?n of hairpins over all secondary structures. Since RNA binding targets are often found in hairpin regions, it follows that saturated structures present potentially more binding targets than nonsaturated structures, on average. Next, we describe a novel algorithm to compute the hairpin profile of a given RNA sequence: given RNA sequence a ₁,…,a _n, for each integer k, we compute that secondary structure S _k having minimum energy in the Nussinov energy model, taken over all secondary structures having k hairpins. We expect that an extension of our algorithm to the Turner energy model may provide more accurate structure prediction for particular RNAs, such as tRNAs and purine riboswitches, known to have a particular number of hairpins. Mathematica^? computations, C and Python source code, and additional supplementary information are available at the website http://bioinformatics.bc.edu/clotelab/RNAhairpinProfile/. 相似文献

6.

ProbKnot: Fast prediction of RNA secondary structure including pseudoknots

Stanislav Bellaousov David H. Mathews 《RNA (New York, N.Y.)》2010,16(10):1870-1880

It is a significant challenge to predict RNA secondary structures including pseudoknots. Here, a new algorithm capable of predicting pseudoknots of any topology, ProbKnot, is reported. ProbKnot assembles maximum expected accuracy structures from computed base-pairing probabilities in O(N²) time, where N is the length of the sequence. The performance of ProbKnot was measured by comparing predicted structures with known structures for a large database of RNA sequences with fewer than 700 nucleotides. The percentage of known pairs correctly predicted was 69.3%. Additionally, the percentage of predicted pairs in the known structure was 61.3%. This performance is the highest of four tested algorithms that are capable of pseudoknot prediction. The program is available for download at: http://rna.urmc.rochester.edu/RNAstructure.html. 相似文献

7.

DsTRD: Danshen Transcriptional Resource Database

Yuxuan Shao Jiabo Wei Fangli Wu Haihua Zhang Dongfeng Yang Zongsuo Liang Weibo Jin 《PloS one》2016,11(2)

相似文献

8.

Prediction of S-Glutathionylation Sites Based on Protein Sequences

Chenglei Sun Zheng-Zheng Shi Xiaobo Zhou Luonan Chen Xing-Ming Zhao 《PloS one》2013,8(2)

S-glutathionylation, the reversible formation of mixed disulfides between glutathione(GSH) and cysteine residues in proteins, is a specific form of post-translational modification that plays important roles in various biological processes, including signal transduction, redox homeostasis, and metabolism inside cells. Experimentally identifying S-glutathionylation sites is labor-intensive and time consuming, whereas bioinformatics methods provide an alternative way to this problem by predicting S-glutathionylation sites in silico. The bioinformatics approaches give not only candidate sites for further experimental verification but also bio-chemical insights into the mechanism of S-glutathionylation. In this paper, we firstly collect experimentally determined S-glutathionylated proteins and their corresponding modification sites from the literature, and then propose a new method for predicting S-glutathionylation sites by employing machine learning methods based on protein sequence data. Promising results are obtained by our method with an AUC (area under ROC curve) score of 0.879 in 5-fold cross-validation, which demonstrates the predictive power of our proposed method. The datasets used in this work are available at http://csb.shu.edu.cn/SGDB. 相似文献

9.

Sebnif: An Integrated Bioinformatics Pipeline for the Identification of Novel Large Intergenic Noncoding RNAs (lincRNAs) - Application in Human Skeletal Muscle Cells

Kun Sun Yu Zhao Huating Wang Hao Sun 《PloS one》2014,9(1)

相似文献

10.

From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data

Jilong Li Jie Hou Lin Sun Jordan Maximillian Wilkins Yuan Lu Chad E. Niederhuth Benjamin Ryan Merideth Thomas P. Mawhinney Valeri V. Mossine C. Michael Greenlief John C. Walker William R. Folk Mark Hannink Dennis B. Lubahn James A. Birchler Jianlin Cheng 《PloS one》2015,10(4)

RNA-Seq techniques generate hundreds of millions of short RNA reads using next-generation sequencing (NGS). These RNA reads can be mapped to reference genomes to investigate changes of gene expression but improved procedures for mining large RNA-Seq datasets to extract valuable biological knowledge are needed. RNAMiner—a multi-level bioinformatics protocol and pipeline—has been developed for such datasets. It includes five steps: Mapping RNA-Seq reads to a reference genome, calculating gene expression values, identifying differentially expressed genes, predicting gene functions, and constructing gene regulatory networks. To demonstrate its utility, we applied RNAMiner to datasets generated from Human, Mouse, Arabidopsis thaliana, and Drosophila melanogaster cells, and successfully identified differentially expressed genes, clustered them into cohesive functional groups, and constructed novel gene regulatory networks. The RNAMiner web service is available at http://calla.rnet.missouri.edu/rnaminer/index.html. 相似文献

11.

Decoding the complex genetic causes of heart diseases using systems biology

Djordje Djordjevic Vinita Deshpande Tomasz Szczesnik Andrian Yang David T. Humphreys Eleni Giannoulatou Joshua W. K. Ho 《Biophysical reviews》2015,7(1):141-159

相似文献

12.

HAAD: A Quick Algorithm for Accurate Prediction of Hydrogen Atoms in Protein Structures

Yunqi Li Ambrish Roy Yang Zhang 《PloS one》2009,4(8)

Hydrogen constitutes nearly half of all atoms in proteins and their positions are essential for analyzing hydrogen-bonding interactions and refining atomic-level structures. However, most protein structures determined by experiments or computer prediction lack hydrogen coordinates. We present a new algorithm, HAAD, to predict the positions of hydrogen atoms based on the positions of heavy atoms. The algorithm is built on the basic rules of orbital hybridization followed by the optimization of steric repulsion and electrostatic interactions. We tested the algorithm using three independent data sets: ultra-high-resolution X-ray structures, structures determined by neutron diffraction, and NOE proton-proton distances. Compared with the widely used programs CHARMM and REDUCE, HAAD has a significantly higher accuracy, with the average RMSD of the predicted hydrogen atoms to the X-ray and neutron diffraction structures decreased by 26% and 11%, respectively. Furthermore, hydrogen atoms placed by HAAD have more matches with the NOE restraints and fewer clashes with heavy atoms. The average CPU cost by HAAD is 18 and 8 times lower than that of CHARMM and REDUCE, respectively. The significant advantage of HAAD in both the accuracy and the speed of the hydrogen additions should make HAAD a useful tool for the detailed study of protein structure and function. Both an executable and the source code of HAAD are freely available at http://zhang.bioinformatics.ku.edu/HAAD. 相似文献

13.

Scalable Steady State Analysis of Boolean Biological Regulatory Networks

Ferhat Ay Fei Xu Tamer Kahveci 《PloS one》2009,4(12)

Background

Computing the long term behavior of regulatory and signaling networks is critical in understanding how biological functions take place in organisms. Steady states of these networks determine the activity levels of individual entities in the long run. Identifying all the steady states of these networks is difficult due to the state space explosion problem.

Methodology

In this paper, we propose a method for identifying all the steady states of Boolean regulatory and signaling networks accurately and efficiently. We build a mathematical model that allows pruning a large portion of the state space quickly without causing any false dismissals. For the remaining state space, which is typically very small compared to the whole state space, we develop a randomized traversal method that extracts the steady states. We estimate the number of steady states, and the expected behavior of individual genes and gene pairs in steady states in an online fashion. Also, we formulate a stopping criterion that terminates the traversal as soon as user supplied percentage of the results are returned with high confidence.

Conclusions

This method identifies the observed steady states of boolean biological networks computationally. Our algorithm successfully reported the G1 phases of both budding and fission yeast cell cycles. Besides, the experiments suggest that this method is useful in identifying co-expressed genes as well. By analyzing the steady state profile of Hedgehog network, we were able to find the highly co-expressed gene pair GL1-SMO together with other such pairs.

Availability

Source code of this work is available at http://bioinformatics.cise.ufl.edu/palSteady.html twocolumnfalse] 相似文献

14.

SCNProDB: A database for the identification of soybean cyst nematode proteins

Savithiry Natarajan Mona Tavakolan Nadim W Alkharouf Benjamin F Matthews 《Bioinformation》2014,10(6):387-389

Soybean cyst nematode (Heterodera glycines, SCN) is the most destructive pathogen of soybean around the world. Crop rotation and resistant cultivars are used to mitigate the damage of SCN, but these approaches are not completely successful because of the varied SCN populations. Thus, the limitations of these practices with soybean dictate investigation of other avenues of protection of soybean against SCN, perhaps through genetically engineering of broad resistance to SCN. For better understanding of the consequences of genetic manipulation, elucidation of SCN protein composition at the subunit level is necessary. We have conducted studies to determine the composition of SCN proteins using a proteomics approach in our laboratory using twodimensional polyacrylamide gel electrophoresis (2D-PAGE) to separate SCN proteins and to characterize the proteins further using mass spectrometry. Our analysis resulted in the identification of several hundred proteins. In this investigation, we developed a web based database (SCNProDB) containing protein information obtained from our previous published studies. This database will be useful to scientists who wish to develop SCN resistant soybean varieties through genetic manipulation and breeding efforts. The database is freely accessible from: http://bioinformatics.towson.edu/Soybean_SCN_proteins_2D_Gel_DB/Gel1.aspx 相似文献

15.

Combinatorics of saturated secondary structures of RNA.

P Clote 《Journal of computational biology》2006,13(9):1640-1657

Following Zuker (1986), a saturated secondary structure for a given RNA sequence is a secondary structure such that no base pair can be added without violating the definition of secondary structure, e.g., without introducing a pseudoknot. In the Nussinov-Jacobson energy model (Nussinov and Jacobson, 1980), where the energy of a secondary structure is -1 times the number of base pairs, saturated secondary structures are local minima in the energy landscape, hence form kinetic traps during the folding process. Here we present recurrence relations and closed form asymptotic limits for combinatorial problems related to the number of saturated secondary structures. In addition, Python source code to compute the number of saturated secondary structures having k base pairs can be found at the web servers link of bioinformatics.bc.edu/clotelab/. 相似文献

16.

De novo assembly of bacterial transcriptomes from RNA-seq data

Brian Tjaden 《Genome biology》2015,16(1)

相似文献

17.

Crumple: A Method for Complete Enumeration of All Possible Pseudoknot-Free RNA Secondary Structures

Samuel Bleckley Jonathan W. Stone Susan J. Schroeder 《PloS one》2012,7(12)

相似文献

18.

G-Quadruplexes Involving Both Strands of Genomic DNA Are Highly Abundant and Colocalize with Functional Sites in the Human Genome

Andrzej S Kudlicki 《PloS one》2016,11(1)

相似文献

19.

Genome-wide identification of heat shock proteins (Hsps) and Hsp interactors in rice: Hsp70s as a case study

Yongfei Wang Shoukai Lin Qi Song Kuan Li Huan Tao Jian Huang Xinhai Chen Shufu Que Huaqin He 《BMC genomics》2014,15(1)

Background

Heat shock proteins (Hsps) perform a fundamental role in protecting plants against abiotic stresses. Although researchers have made great efforts on the functional analysis of individual family members, Hsps have not been fully characterized in rice (Oryza sativa L.) and little is known about their interactors.

Results

In this study, we combined orthology-based approach with expression association data to screen rice Hsps for the expression patterns of which strongly correlated with that of heat responsive probe-sets. Twenty-seven Hsp candidates were identified, including 12 small Hsps, six Hsp70s, three Hsp60s, three Hsp90s, and three clpB/Hsp100s. Then, using a combination of interolog and expression profile-based methods, we inferred 430 interactors of Hsp70s in rice, and validated the interactions by co-localization and function-based methods. Subsequent analysis showed 13 interacting domains and 28 target motifs were over-represented in Hsp70s interactors. Twenty-four GO terms of biological processes and five GO terms of molecular functions were enriched in the positive interactors, whose expression levels were positively associated with Hsp70s. Hsp70s interaction network implied that Hsp70s were involved in macromolecular translocation, carbohydrate metabolism, innate immunity, photosystem II repair and regulation of kinase activities.

Conclusions

Twenty-seven Hsps in rice were identified and 430 interactors of Hsp70s were inferred and validated, then the interacting network of Hsp70s was induced and the function of Hsp70s was analyzed. Furthermore, two databases named Rice Heat Shock Proteins (RiceHsps) and Rice Gene Expression Profile (RGEP), and one online tool named Protein-Protein Interaction Predictor (PPIP), were constructed and could be accessed at http://bioinformatics.fafu.edu.cn/.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-344) contains supplementary material, which is available to authorized users. 相似文献

20.

Genome Sequence of Aggregatibacter actinomycetemcomitans Serotype c Strain D11S-1

Casey Chen Weerayuth Kittichotirat Yan Si Roger Bumgarner 《Journal of bacteriology》2009,191(23):7378-7379

Aggregatibacter actinomycetemcomitans is a major etiological agent of periodontitis. Here we report the complete genome sequence of serotype c strain D11S-1, which was recovered from the subgingival plaque of a patient diagnosed with generalized aggressive periodontitis.Aggregatibacter actinomycetemcomitans is a major etiologic agent of human periodontal disease, in particular aggressive periodontitis (). The natural population of A. actinomycetemcomitans is clonal (). Six A. actinomycetemcomitans serotypes are distinguished based on the structural and serological characteristics of the O antigen of LPS (, ). Three of the serotypes (a, b, and c) comprise >80% of all strains, and each serotype represents a distinct clonal lineage (, , ). Serotype c strain D11S-1 was cultured from a subgingival plaque sample of a patient diagnosed with generalized aggressive periodontitis. The complete genome sequencing of the strain was determined by 454 pyrosequencing (), which achieved 25× coverage. Assembly was performed using the Newbler assembler (454, Branford, CT) and generated 199 large contigs, with 99.3% of the bases having a quality score of 40 and above. The contigs were aligned with the genome of the sequenced serotype b strain HK1651 (http://www.genome.ou.edu/act.html) using software written in house. The putative contig gaps were then closed by primer walking and sequencing of PCR products over the gaps. The final genome assembly was further confirmed by comparison of an in silico NcoI restriction map to the experimental map generated by optical mapping (). The genome structure of the D11S-1 strain was compared to that of the sequenced strain HK1651 using the program MAUVE (, ). The automated annotation was done using a protocol similar to the annotation engine service at The Institute for Genomic Research/J. Craig Venter Institute with some local modifications. Briefly, protein-coding genes were identified using Glimmer3 (). Each protein sequence was then annotated by comparing to the GenBank nonredundant protein database. BLAST-Extend-Repraze was applied to the predicted genes to identify genes that might have been truncated due to a frameshift mutation or premature stop codon. tRNA and rRNA genes were identified by using tRNAScan-SE () and a similarity search to our in-house RNA database, respectively.The D11S-1 circular genome contains 2,105,764 nucleotides, a GC content of 44.55%, 2,134 predicted coding sequences, and 54 tRNA and 19 rRNA genes (see additional data at http://expression.washington.edu/bumgarnerlab/publications.php). The distribution of predicted genes based on functional categories was similar between D11S-1 and HK1651 (http://expression.washington.edu/bumgarnerlab/publications.php). One hundred six and 86 coding sequences were unique to strain D11S-1 and HK1651, respectively (http://expression.washington.edu/bumgarnerlab/publications.php). Genomic islands were identified based on annotations for strain HK1651 and based on manual inspection of contiguous D11S-1 specific DNA regions with G+C bias (http://expression.washington.edu/bumgarnerlab/publications.php). Among 12 identified genomics islands, 5 (B, C, D, E and G; cytolethal distending toxin gene cluster, tight adherence gene cluster, O-antigen biosynthesis and transport gene cluster, leukotoxin gene cluster, and lipoligosaccharide biosynthesis enzyme gene, respectively) correspond to islands 2 to 5 and 8 of strain HK1651 (http://www.oralgen.lanl.gov/) (). Island F (∼5 kb) is homologous to a portion of the 12.5-kb island 7 in HK1651. Five genomic islands (H to L) were unique to strain D11S-1. The remaining island (A) is a fusion of genomic islands 1 and 6, in strain HK1651. The genome of D11S-1 is largely in synteny with the genome of the sequenced serotype b strain HK1651 but contained several large-scale genomic rearrangements.Strain D11S-1 harbors a 43-kb bacteriophage and two plasmids of 31 and 23 kb (http://expression.washington.edu/bumgarnerlab/publications.php). Excluding an ∼9-kb region of low homology, the phage showed >90% nucleotide sequence identity with AaΦ23 (). A 49-bp attB site () was identified at coordinates 2,024,825 to 2,024,873. The location of the inserted phage was identified in the optical map of strain D11S-1 and further confirmed by PCR amplification and sequencing of the regions flanking the insertion site. A closed circular form of the phage was also detected in strain D11S-1 by PCR analysis of the phage ends. The 23-kb plasmid is homologous to pVT745 (92% nucleotide identities). The 31-kb plasmid is a novel plasmid. It has significant homologies in short regions (<2 kb) to Haemophilus influenzae biotype aegyptius plasmid pF1947 and other plasmids. 相似文献