首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
3.
Identification and annotation of noncoding RNAs in Saccharomycotina   总被引:1,自引:0,他引:1  
The importance of ncRNAs in biological processes makes their annotation an essential component of any genome-sequencing project. The identification of ncRNAs in genomes requires specific expertise and tools that are distinct from the traditional protein gene annotation tools. Here, we describe the assembly of two automatic annotation pipelines, integrating publicly available tools, for homology and de novo ncRNA search in genomes. We applied both pipelines to 10 Saccharomycotina genomes and were able to find and annotate 693 ncRNA genes, corresponding to 81% of the ncRNAs expected for those genomes assuming the number of ncRNAs in Saccharomyces cerevisiae (86) as a reference. Several new ncRNAs, not yet known in the Saccharomycotina clade, were also detected. The results show the feasibility of automatic search for ncRNAs in full genomes and the utility of such approaches in large multi-genome sequencing and annotation projects.  相似文献   

4.

Background

Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals.

Results

We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure similarity search as well as class specific methods, we obtained a conservative set with a total of 3,391 structured RNA loci of which 1,011 and 2,314, respectively, hold strong sequence and structure similarity to structured RNAs in existing databases. The RNA loci cover 139 cis-regulatory element loci, 58 lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome, we obtained no matches at the highest confidence level. Additional analysis of RNA-seq data from a pooled library from 10 different pig tissues added another 165 miRNA loci, yielding an overall annotation of 3,556 structured RNA loci. This annotation represents our best effort at making an automated annotation. To further enhance the reliability, 571 of the 3,556 structured RNAs were manually curated by methods depending on the RNA class while 1,581 were declared as pseudogenes. We further created a multiple alignment of pig against 20 representative vertebrates, from which RNAz predicted 83,859 de novo RNA loci with conserved RNA structures. 528 of the RNAz predictions overlapped with the homology based annotation or novel miRNAs. We further present a substantial synteny analysis which includes 1,004 lineage specific de novo RNA loci and 4 ncRNA loci in the known annotation specific for Laurasiatheria (pig, cow, dolphin, horse, cat, dog, hedgehog).

Conclusions

We have obtained one of the most comprehensive annotations for structured ncRNAs of a mammalian genome, which is likely to play central roles in both health modelling and production. The core annotation is available in Ensembl 70 and the complete annotation is available at http://rth.dk/resources/rnannotator/susscr102/version1.02.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-459) contains supplementary material, which is available to authorized users.  相似文献   

5.
Alternative splicing generates functional diversity in higher organisms through alternative first and last exons, skipped and included exons, intron retentions and alternative donor, and acceptor sites. In large-scale microarray studies in humans and the mouse, emphasis so far has been placed on exon-skip events, leaving the prevalence and importance of other splice types largely unexplored. Using a new human splice variant database and a genome-wide microarray to probes thousands of splice events of each type, we measured differential expression of splice types across six pair of diverse cell lines and validated the database annotation process. Results suggest that splicing in humans is more complex than simple exon-skip events, which account for a minority of splicing differences. The relative frequency of differential expression of the splice types correlates with what is found by our annotation efforts. In conclusion, alternative splicing in human cells is considerably more complex than the canonical example of the exon skip. The complementary approaches of genome-wide annotation of alternative splicing in human and design of genome-wide splicing microarrays to measure differential splicing in biological samples provide a powerful high-throughput tool to study the role of alternative splicing in human biology.  相似文献   

6.
Here we present the results of a large-scale bioinformatics annotation of non-coding RNA loci in 48 avian genomes. Our approach uses probabilistic models of hand-curated families from the Rfam database to infer conserved RNA families within each avian genome. We supplement these annotations with predictions from the tRNA annotation tool, tRNAscan-SE and microRNAs from miRBase. We identify 34 lncRNA-associated loci that are conserved between birds and mammals and validate 12 of these in chicken. We report several intriguing cases where a reported mammalian lncRNA, but not its function, is conserved. We also demonstrate extensive conservation of classical ncRNAs (e.g., tRNAs) and more recently discovered ncRNAs (e.g., snoRNAs and miRNAs) in birds. Furthermore, we describe numerous “losses” of several RNA families, and attribute these to either genuine loss, divergence or missing data. In particular, we show that many of these losses are due to the challenges associated with assembling avian microchromosomes. These combined results illustrate the utility of applying homology-based methods for annotating novel vertebrate genomes.  相似文献   

7.
Recently, genome-wide surveys for non-coding RNAs have provided evidence for tens of thousands of previously undescribed evolutionary conserved RNAs with distinctive secondary structures. The annotation of these putative ncRNAs, however, remains a difficult problem. Here we describe an SVM-based approach that, in conjunction with a non-stringent filter for consensus secondary structures, is capable of efficiently recognizing microRNA precursors in multiple sequence alignments. The software was applied to recent genome-wide RNAz surveys of mammals, urochordates, and nematodes. AVAILABILITY: The program RNAmicro is available as source code and can be downloaded from http://www.bioinf.uni-leipzig/Software/RNAmicro.  相似文献   

8.
SUMMARY: We have launched a web server, which serves as a general-purpose idiogram rendering service, and allows users to generate high-quality idiograms with custom annotation according to their own genome-wide mapping/annotation data through an easy-to-use interface. The generated idiograms are suitable not only for visualizing summaries of genome-wide analysis but also for many types of presentation material including web pages, conference posters, oral presentations, etc. AVAILABILITY: Idiographica is freely available at http://www.ncrna.org/idiographica/  相似文献   

9.
The baker's yeast mutation collections are extensively used genetic resources that are the basis for many genome-wide screens and new technologies. Anecdotal evidence has previously pointed to the putative existence of a neighboring gene effect (NGE) in these collections. NGE occurs when the phenotype of a strain carrying a particular perturbed gene is due to the lack of proper function of its adjacent gene. Here we performed a large-scale study of NGEs, presenting a network-based algorithm for detecting NGEs and validating software predictions using complementation experiments. We applied our approach to four datasets uncovering a similar magnitude of NGE in each (7-15%). These results have important consequences for systems biology, as the mutation collections are extensively used in almost every aspect of the field, from genetic network analysis to functional gene annotation.  相似文献   

10.
Deep annotation of Populus trichocarpa microRNAs from diverse tissue sets   总被引:1,自引:0,他引:1  
  相似文献   

11.
12.
Gene annotation, as measured by links to the biomedical literature and funded grants, is governed by a power law, indicating that researchers favor the extensive study of relatively few genes. This emphasizes the need for data-driven science to accomplish genome-wide gene annotation.  相似文献   

13.
Evaluation of annotation strategies using an entire genome sequence   总被引:2,自引:0,他引:2  
MOTIVATION: Genome-wide functional annotation either by manual or automatic means has raised considerable concerns regarding the accuracy of assignments and the reproducibility of methodologies. In addition, a performance evaluation of automated systems that attempt to tackle sequence analyses rapidly and reproducibly is generally missing. In order to quantify the accuracy and reproducibility of function assignments on a genome-wide scale, we have re-annotated the entire genome sequence of Chlamydia trachomatis (serovar D), in a collaborative manner. RESULTS: We have encoded all annotations in a structured format to allow further comparison and data exchange and have used a scale that records the different levels of potential annotation errors according to their propensity to propagate in the database due to transitive function assignments. We conclude that genome annotation may entail a considerable amount of errors, ranging from simple typographical errors to complex sequence analysis problems. The most surprising result of this comparative study is that automatic systems might perform as well as the teams of experts annotating genome sequences.  相似文献   

14.
15.
16.
Small nucleolar RNAs (snoRNAs) are noncoding RNAs that direct 2′-O-methylation or pseudouridylation on ribosomal RNAs or spliceosomal small nuclear RNAs. These modifications are needed to modulate the activity of ribosomes and spliceosomes. A comprehensive repertoire of snoRNAs is needed to expand the knowledge of these modifications. The sequences corresponding to snoRNAs in 18–26-nt small RNA sequencing data have been rarely explored and remain as a hidden treasure for snoRNA annotation. Here, we showed the enrichment of small RNAs at Arabidopsis snoRNA termini and developed a computational approach to identify snoRNAs on the basis of this characteristic. The approach successfully uncovered the full-length sequences of 144 known Arabidopsis snoRNA genes, including some snoRNAs with improved 5′- or 3′-end annotation. In addition, we identified 27 and 17 candidates for novel box C/D and box H/ACA snoRNAs, respectively. Northern blot analysis and sequencing data from parallel analysis of RNA ends confirmed the expression and the termini of the newly predicted snoRNAs. Our study especially expanded on the current knowledge of box H/ACA snoRNAs and snoRNA species targeting snRNAs. In this study, we demonstrated that the use of small RNA sequencing data can increase the complexity and the accuracy of snoRNA annotation.  相似文献   

17.
18.
High-throughput RNA-seq has revolutionized the process of small RNA (sRNA) discovery, leading to a rapid expansion of sRNA categories. In addition to the previously well-characterized sRNAs such as microRNAs (miRNAs), piwi-interacting RNAs (piRNAs), and small nucleolar RNA (snoRNAs), recent emerging studies have spotlighted on tRNA-derived sRNAs (tsRNAs) and rRNA-derived sRNAs (rsRNAs) as new categories of sRNAs that bear versatile functions. Since existing software and pipelines for sRNA annotation are mostly focused on analyzing miRNAs or piRNAs, here we developed the sRNA annotation pipelineoptimized for rRNA- and tRNA-derived sRNAs (SPORTS1.0). SPORTS1.0 is optimized for analyzing tsRNAs and rsRNAs from sRNA-seq data, in addition to its capacity to annotate canonical sRNAs such as miRNAs and piRNAs. Moreover, SPORTS1.0 can predict potential RNA modification sites based on nucleotide mismatches within sRNAs. SPORTS1.0 is precompiled to annotate sRNAs for a wide range of 68 species across bacteria, yeast, plant, and animal kingdoms, while additional species for analyses could be readily expanded upon end users’ input. For demonstration, by analyzing sRNA datasets using SPORTS1.0, we reveal that distinct signatures are present in tsRNAs and rsRNAs from different mouse cell types. We also find that compared to other sRNA species, tsRNAs bear the highest mismatch rate, which is consistent with their highly modified nature. SPORTS1.0 is an open-source software and can be publically accessed at https://github.com/junchaoshi/sports1.0.  相似文献   

19.
Small RNAs regulate gene expression and most genes in the worm Caenorhabditis elegans are subject to their regulation. Here, we analyze small RNA data sets and use reproducible features of RNAs present in multiple data sets to discover a new class of small RNAs and to reveal insights into two known classes of small RNAs—22G RNAs and 26G RNAs. We found that reproducibly detected 22-nt RNAs, although are predominantly RNAs with a G at the 5′ end, also include RNAs with A, C, or U at the 5′ end. These RNAs are synthesized downstream from characteristic sequence motifs on mRNA and have U-tailed derivatives. Analysis of 26G RNAs revealed that they are processed from a blunt end of double-stranded RNAs and that production of one 26G RNA generates a hotspot immediately downstream for production of another. To our surprise, analysis of RNAs shorter than 18 nt revealed a new class of RNAs, which we call NU RNAs (pronounced “new RNAs”) because they have a NU bias at the 5′ end, where N is any nucleotide. NU RNAs are antisense to genes and originate downstream from U bases on mRNA. Although many genes have complementary NU RNAs, their genome-wide distribution is distinct from that of previously known classes of small RNAs. Our results suggest that current approaches underestimate reproducibly detected RNAs that are shorter than 18 nt, and theoretical considerations suggest that such shorter RNAs could be used for sequence-specific gene regulation in organisms like C. elegans that have small genomes.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号