首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Conformational entropy for atomic-level, three dimensional biomolecules is known experimentally to play an important role in protein-ligand discrimination, yet reliable computation of entropy remains a difficult problem. Here we describe the first two accurate and efficient algorithms to compute the conformational entropy for RNA secondary structures, with respect to the Turner energy model, where free energy parameters are determined from UV absorption experiments. An algorithm to compute the derivational entropy for RNA secondary structures had previously been introduced, using stochastic context free grammars (SCFGs). However, the numerical value of derivational entropy depends heavily on the chosen context free grammar and on the training set used to estimate rule probabilities. Using data from the Rfam database, we determine that both of our thermodynamic methods, which agree in numerical value, are substantially faster than the SCFG method. Thermodynamic structural entropy is much smaller than derivational entropy, and the correlation between length-normalized thermodynamic entropy and derivational entropy is moderately weak to poor. In applications, we plot the structural entropy as a function of temperature for known thermoswitches, such as the repression of heat shock gene expression (ROSE) element, we determine that the correlation between hammerhead ribozyme cleavage activity and total free energy is improved by including an additional free energy term arising from conformational entropy, and we plot the structural entropy of windows of the HIV-1 genome. Our software RNAentropy can compute structural entropy for any user-specified temperature, and supports both the Turner’99 and Turner’04 energy parameters. It follows that RNAentropy is state-of-the-art software to compute RNA secondary structure conformational entropy. Source code is available at https://github.com/clotelab/RNAentropy/; a full web server is available at http://bioinformatics.bc.edu/clotelab/RNAentropy, including source code and ancillary programs.  相似文献   

3.
Nanotechnology and synthetic biology currently constitute one of the most innovative, interdisciplinary fields of research, poised to radically transform society in the 21st century. This paper concerns the synthetic design of ribonucleic acid molecules, using our recent algorithm, RNAiFold, which can determine all RNA sequences whose minimum free energy secondary structure is a user-specified target structure. Using RNAiFold, we design ten cis-cleaving hammerhead ribozymes, all of which are shown to be functional by a cleavage assay. We additionally use RNAiFold to design a functional cis-cleaving hammerhead as a modular unit of a synthetic larger RNA. Analysis of kinetics on this small set of hammerheads suggests that cleavage rate of computationally designed ribozymes may be correlated with positional entropy, ensemble defect, structural flexibility/rigidity and related measures. Artificial ribozymes have been designed in the past either manually or by SELEX (Systematic Evolution of Ligands by Exponential Enrichment); however, this appears to be the first purely computational design and experimental validation of novel functional ribozymes. RNAiFold is available at http://bioinformatics.bc.edu/clotelab/RNAiFold/.  相似文献   

4.
Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data (3DBG), and a 3D genome browser to visualize and explore 3D genome structures (3DGB). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/.  相似文献   

5.

Background

Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals.

Results

We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure similarity search as well as class specific methods, we obtained a conservative set with a total of 3,391 structured RNA loci of which 1,011 and 2,314, respectively, hold strong sequence and structure similarity to structured RNAs in existing databases. The RNA loci cover 139 cis-regulatory element loci, 58 lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome, we obtained no matches at the highest confidence level. Additional analysis of RNA-seq data from a pooled library from 10 different pig tissues added another 165 miRNA loci, yielding an overall annotation of 3,556 structured RNA loci. This annotation represents our best effort at making an automated annotation. To further enhance the reliability, 571 of the 3,556 structured RNAs were manually curated by methods depending on the RNA class while 1,581 were declared as pseudogenes. We further created a multiple alignment of pig against 20 representative vertebrates, from which RNAz predicted 83,859 de novo RNA loci with conserved RNA structures. 528 of the RNAz predictions overlapped with the homology based annotation or novel miRNAs. We further present a substantial synteny analysis which includes 1,004 lineage specific de novo RNA loci and 4 ncRNA loci in the known annotation specific for Laurasiatheria (pig, cow, dolphin, horse, cat, dog, hedgehog).

Conclusions

We have obtained one of the most comprehensive annotations for structured ncRNAs of a mammalian genome, which is likely to play central roles in both health modelling and production. The core annotation is available in Ensembl 70 and the complete annotation is available at http://rth.dk/resources/rnannotator/susscr102/version1.02.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-459) contains supplementary material, which is available to authorized users.  相似文献   

6.
Determining the function of a non-coding RNA requires costly and time-consuming wet-lab experiments. For this reason, computational methods which ascertain the homology of a sequence and thereby deduce functionality and family membership are often exploited. In this fashion, newly sequenced genomes can be annotated in a completely computational way. Covariance models are commonly used to assign novel RNA sequences to a known RNA family. However, to construct such models several examples of the family have to be already known. Moreover, model building is the work of experts who manually edit the necessary RNA alignment and consensus structure. Our method, RNAlien, starting from a single input sequence collects potential family member sequences by multiple iterations of homology search. RNA family models are fully automatically constructed for the found sequences. We have tested our method on a subset of the RfamRNA family database. RNAlien models are a starting point to construct models of comparable sensitivity and specificity to manually curated ones from the Rfam database. RNAlien Tool and web server are available at http://rna.tbi.univie.ac.at/rnalien/.  相似文献   

7.
8.
Numerous high-throughput sequencing studies have focused on detecting conventionally spliced mRNAs in RNA-seq data. However, non-standard RNAs arising through gene fusion, circularization or trans-splicing are often neglected. We introduce a novel, unbiased algorithm to detect splice junctions from single-end cDNA sequences. In contrast to other methods, our approach accommodates multi-junction structures. Our method compares favorably with competing tools for conventionally spliced mRNAs and, with a gain of up to 40% of recall, systematically outperforms them on reads with multiple splits, trans-splicing and circular products. The algorithm is integrated into our mapping tool segemehl (http://www.bioinf.uni-leipzig.de/Software/segemehl/).  相似文献   

9.
Cruz JA  Westhof E 《Nature methods》2011,8(6):513-521
Structural RNA modules, sets of ordered non-Watson-Crick base pairs embedded between Watson-Crick pairs, have central roles as architectural organizers and sites of ligand binding in RNA molecules, and are recurrently observed in RNA families throughout the phylogeny. Here we describe a computational tool, RNA three-dimensional (3D) modules detection, or RMDetect, for identifying known 3D structural modules in single and multiple RNA sequences in the absence of any other information. Currently, four modules can be searched for: G-bulge loop, kink-turn, C-loop and tandem-GA loop. In control test sequences we found all of the known modules with a false discovery rate of 0.23. Scanning through 1,444 publicly available alignments, we identified 21 yet unreported modules and 141 known modules. RMDetect can be used to refine RNA 2D structure, assemble RNA 3D models, and search and annotate structured RNAs in genomic data.  相似文献   

10.
Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).  相似文献   

11.
A novel method is presented for predicting the common secondary structures and alignment of two homologous RNA sequences by sampling the ‘structural alignment’ space, i.e. the joint space of their alignments and common secondary structures. The structural alignment space is sampled according to a pseudo-Boltzmann distribution based on a pseudo-free energy change that combines base pairing probabilities from a thermodynamic model and alignment probabilities from a hidden Markov model. By virtue of the implicit comparative analysis between the two sequences, the method offers an improvement over single sequence sampling of the Boltzmann ensemble. A cluster analysis shows that the samples obtained from joint sampling of the structural alignment space cluster more closely than samples generated by the single sequence method. On average, the representative (centroid) structure and alignment of the most populated cluster in the sample of structures and alignments generated by joint sampling are more accurate than single sequence sampling and alignment based on sequence alone, respectively. The ‘best’ centroid structure that is closest to the known structure among all the centroids is, on average, more accurate than structure predictions of other methods. Additionally, cluster analysis identifies, on average, a few clusters, whose centroids can be presented as alternative candidates. The source code for the proposed method can be downloaded at http://rna.urmc.rochester.edu.  相似文献   

12.

Background

The ever increasing discovery of non-coding RNAs leads to unprecedented demand for the accurate modeling of RNA folding, including the predictions of two-dimensional (base pair) and three-dimensional all-atom structures and folding stabilities. Accurate modeling of RNA structure and stability has far-reaching impact on our understanding of RNA functions in human health and our ability to design RNA-based therapeutic strategies.

Results

The Vfold server offers a web interface to predict (a) RNA two-dimensional structure from the nucleotide sequence, (b) three-dimensional structure from the two-dimensional structure and the sequence, and (c) folding thermodynamics (heat capacity melting curve) from the sequence. To predict the two-dimensional structure (base pairs), the server generates an ensemble of structures, including loop structures with the different intra-loop mismatches, and evaluates the free energies using the experimental parameters for the base stacks and the loop entropy parameters given by a coarse-grained RNA folding model (the Vfold model) for the loops. To predict the three-dimensional structure, the server assembles the motif scaffolds using structure templates extracted from the known PDB structures and refines the structure using all-atom energy minimization.

Conclusions

The Vfold-based web server provides a user friendly tool for the prediction of RNA structure and stability. The web server and the source codes are freely accessible for public use at “http://rna.physics.missouri.edu”.  相似文献   

13.
During recent years, miRNAs have been shown to play important roles in the regulation of gene expression. Accordingly, much effort has been put into the discovery of novel uncharacterized miRNAs in various organisms. miRNAs are structurally defined by a hairpin-loop structure recognized by the two-step processing apparatus, Drosha and Dicer, necessary for the production of mature ∼22-nucleotide miRNA guide strands. With the emergence of high-throughput sequencing applications, tools have been developed to identify miRNAs and profile their expression based on sequencing reads. However, as the read depth increases, false-positive predictions increase using established algorithms, underscoring the need for more stringent approaches. Here we describe a transparent pipeline for confident miRNA identification in animals, termed miRdentify. We show that miRdentify confidently discloses more than 400 novel miRNAs in humans, including the first male-specific miRNA, which we successfully validate. Moreover, novel miRNAs are predicted in the mouse, the fruit fly and nematodes, suggesting that the pipeline applies to all animals. The entire software package is available at www.ncrnalab.dk/mirdentify.  相似文献   

14.
Short interfering RNAs (siRNAs) are a class of regulatory effectors that enforce gene silencing through formation of RNA duplexes. Although progress has been made in identifying the capabilities of siRNAs in silencing foreign RNA and transposable elements, siRNA functions in endogenous gene regulation have remained mysterious. In certain organisms, siRNA biosynthesis involves novel enzymes that act as RNA-directed RNA polymerases (RdRPs). Here we analyze the function of a Caenorhabditis elegans RdRP, RRF-3, during spermatogenesis. We found that loss of RRF-3 function resulted in pleiotropic defects in sperm development and that sperm defects led to embryonic lethality. Notably, sperm nuclei in mutants of either rrf-3 or another component of the siRNA pathway, eri-1, were frequently surrounded by ectopic microtubule structures, with spindle abnormalities in a subset of the resulting embryos. Through high-throughput small RNA sequencing, we identified a population of cellular mRNAs from spermatogenic cells that appear to serve as templates for antisense siRNA synthesis. This set of genes includes the majority of genes known to have enriched expression during spermatogenesis, as well as many genes not previously known to be expressed during spermatogenesis. In a subset of these genes, we found that RRF-3 was required for effective siRNA accumulation. These and other data suggest a working model in which a major role of the RRF-3/ERI pathway is to generate siRNAs that set patterns of gene expression through feedback repression of a set of critical targets during spermatogenesis.REPRESSION of gene expression by small RNAs of ∼20–30 nt in length is important for many aspects of multicellular eukaryotic development. A variety of classes of small RNA with distinct structural features, modes of biogenesis, and biological functions have been identified (reviewed in Hutvagner and Simard 2008). We are particularly interested in a class of small RNAs, called endogenous short interfering RNAs (siRNAs), that are similar to intermediates in exogenously triggered RNA interference (RNAi) in their perfect complementarity to mRNA targets. High-throughput sequencing technology has provided a valuable tool for characterization of endogenous siRNA populations from many diverse sources, including mouse embryonic stem cells (Babiarz et al. 2008), Drosophila heads (Ghildiyal et al. 2008), and Arabidopsis pollen (Slotkin et al. 2009). These siRNAs have been proposed to function in the regulation of both cellular processes and genome defense through downregulation of gene expression. Caenorhabditis elegans, like plants and fungi, utilizes RNA-copying enzymes called RNA-directed RNA polymerases (RdRPs) as part of the RNAi machinery (Smardon et al. 2000; Sijen et al. 2001). While two of the C. elegans RdRPs are nonessential (RRF-1 and RRF-2), mutations in either of the remaining two (EGO-1 or RRF-3) lead to fertility defects (Smardon et al. 2000; Simmer et al. 2002). RRF-3 is functionally distinct from EGO-1 in that the RRF-3 requirement in fertility is temperature dependent. In addition, RRF-3 activity has an inhibitory effect on exogenously triggered RNAi (resulting in an ERI, or enhanced RNAi, mutant phenotype in rrf-3 mutants). Mutants lacking either RRF-3 or another ERI factor, ERI-1, have been used as experimental tools because of their enhanced sensitivity in RNAi-based screens. One proposed mechanism for the enhancement in RNAi in rrf-3 and eri mutants has been a competition for cofactors between the exogenously triggered RNAi pathway and an endogenous RNAi pathway. Consistent with this hypothesis, siRNAs corresponding to several genes have been shown by Northern analysis to depend upon RRF-3 and other ERI factors for their accumulation (Duchaine et al. 2006; Lee et al. 2006; Yigit et al. 2006). Global microarray analyses have also been undertaken to identify messenger RNAs whose expression is affected by RRF-3 and ERI-1 (Lee et al. 2006; Asikainen et al. 2007).A functional significance of the RRF-3/ERI pathway has been inferred by the inability of rrf-3, eri-1, eri-3, and eri-5 mutant strains to propagate at a high growth temperature (Simmer et al. 2002; Duchaine et al. 2006). Rather than producing temperature-sensitive mutant protein effects, RRF-3 and other ERI proteins are thought to act in a temperature-sensitive process, as evidenced by the predicted truncated and presumed nonfunctional protein fragments that would result from the available deletion alleles and by their shared temperature-sensitive phenotypes. rrf-3 mutant animals have been observed to exhibit X-chromosome missegregation (Simmer et al. 2002) and an unusual persistence of a chromatin mark on the X chromosome during male spermatogenesis (Maine et al. 2005). X-chromosome missegregation and defective spermatogenesis have been referred to in previous studies of eri-1 (Kennedy et al. 2004) and eri-3 and eri-5 (Duchaine et al. 2006). Furthermore, eri-3 mutant sterility can be rescued by insemination with wild-type sperm (Duchaine et al. 2006).Here we investigated the role of RRF-3 during spermatogenesis. We found defects evident at multiple stages, including after fertilization, where defects in rrf-3 mutant sperm can produce subsequent nonviable embryos. By using high-throughput sequencing, we characterized a large population of siRNAs present in spermatogenic cells and found a strong enrichment for antisense siRNAs from genes with known mRNA expression during spermatogenesis. While the majority of siRNA production during spermatogenesis does not require RRF-3, we found a set of genes for which siRNA production was dependent upon RRF-3. Existing data indicate increased expression for these genes in rrf-3 and/or eri-1 mutants. Taken together, our analyses suggest a working model in which the RRF-3/ERI pathway generates siRNAs that downregulate specific genes during spermatogenesis, with this regulation playing a key role in generating functional sperm.  相似文献   

15.

Background

Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate.

Results

We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software.

Conclusions

SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.  相似文献   

16.
It is a significant challenge to predict RNA secondary structures including pseudoknots. Here, a new algorithm capable of predicting pseudoknots of any topology, ProbKnot, is reported. ProbKnot assembles maximum expected accuracy structures from computed base-pairing probabilities in O(N2) time, where N is the length of the sequence. The performance of ProbKnot was measured by comparing predicted structures with known structures for a large database of RNA sequences with fewer than 700 nucleotides. The percentage of known pairs correctly predicted was 69.3%. Additionally, the percentage of predicted pairs in the known structure was 61.3%. This performance is the highest of four tested algorithms that are capable of pseudoknot prediction. The program is available for download at: http://rna.urmc.rochester.edu/RNAstructure.html.  相似文献   

17.
18.
19.

Background

Next-generation sequencing technology provides a means to study genetic exchange at a higher resolution than was possible using earlier technologies. However, this improvement presents challenges as the alignments of next generation sequence data to a reference genome cannot be directly used as input to existing detection algorithms, which instead typically use multiple sequence alignments as input. We therefore designed a software suite called REDHORSE that uses genomic alignments, extracts genetic markers, and generates multiple sequence alignments that can be used as input to existing recombination detection algorithms. In addition, REDHORSE implements a custom recombination detection algorithm that makes use of sequence information and genomic positions to accurately detect crossovers. REDHORSE is a portable and platform independent suite that provides efficient analysis of genetic crosses based on Next-generation sequencing data.

Results

We demonstrated the utility of REDHORSE using simulated data and real Next-generation sequencing data. The simulated dataset mimicked recombination between two known haploid parental strains and allowed comparison of detected break points against known true break points to assess performance of recombination detection algorithms. A newly generated NGS dataset from a genetic cross of Toxoplasma gondii allowed us to demonstrate our pipeline. REDHORSE successfully extracted the relevant genetic markers and was able to transform the read alignments from NGS to the genome to generate multiple sequence alignments. Recombination detection algorithm in REDHORSE was able to detect conventional crossovers and double crossovers typically associated with gene conversions whilst filtering out artifacts that might have been introduced during sequencing or alignment. REDHORSE outperformed other commonly used recombination detection algorithms in finding conventional crossovers. In addition, REDHORSE was the only algorithm that was able to detect double crossovers.

Conclusion

REDHORSE is an efficient analytical pipeline that serves as a bridge between genomic alignments and existing recombination detection algorithms. Moreover, REDHORSE is equipped with a recombination detection algorithm specifically designed for Next-generation sequencing data. REDHORSE is portable, platform independent Java based utility that provides efficient analysis of genetic crosses based on Next-generation sequencing data. REDHORSE is available at http://redhorse.sourceforge.net/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1309-7) contains supplementary material, which is available to authorized users.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号