首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.

Background

Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals.

Results

We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure similarity search as well as class specific methods, we obtained a conservative set with a total of 3,391 structured RNA loci of which 1,011 and 2,314, respectively, hold strong sequence and structure similarity to structured RNAs in existing databases. The RNA loci cover 139 cis-regulatory element loci, 58 lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome, we obtained no matches at the highest confidence level. Additional analysis of RNA-seq data from a pooled library from 10 different pig tissues added another 165 miRNA loci, yielding an overall annotation of 3,556 structured RNA loci. This annotation represents our best effort at making an automated annotation. To further enhance the reliability, 571 of the 3,556 structured RNAs were manually curated by methods depending on the RNA class while 1,581 were declared as pseudogenes. We further created a multiple alignment of pig against 20 representative vertebrates, from which RNAz predicted 83,859 de novo RNA loci with conserved RNA structures. 528 of the RNAz predictions overlapped with the homology based annotation or novel miRNAs. We further present a substantial synteny analysis which includes 1,004 lineage specific de novo RNA loci and 4 ncRNA loci in the known annotation specific for Laurasiatheria (pig, cow, dolphin, horse, cat, dog, hedgehog).

Conclusions

We have obtained one of the most comprehensive annotations for structured ncRNAs of a mammalian genome, which is likely to play central roles in both health modelling and production. The core annotation is available in Ensembl 70 and the complete annotation is available at http://rth.dk/resources/rnannotator/susscr102/version1.02.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-459) contains supplementary material, which is available to authorized users.  相似文献   

2.
Identification and annotation of noncoding RNAs in Saccharomycotina   总被引:1,自引:0,他引:1  
The importance of ncRNAs in biological processes makes their annotation an essential component of any genome-sequencing project. The identification of ncRNAs in genomes requires specific expertise and tools that are distinct from the traditional protein gene annotation tools. Here, we describe the assembly of two automatic annotation pipelines, integrating publicly available tools, for homology and de novo ncRNA search in genomes. We applied both pipelines to 10 Saccharomycotina genomes and were able to find and annotate 693 ncRNA genes, corresponding to 81% of the ncRNAs expected for those genomes assuming the number of ncRNAs in Saccharomyces cerevisiae (86) as a reference. Several new ncRNAs, not yet known in the Saccharomycotina clade, were also detected. The results show the feasibility of automatic search for ncRNAs in full genomes and the utility of such approaches in large multi-genome sequencing and annotation projects.  相似文献   

3.
Hepadnaviridae are double-stranded DNA viruses that infect some species of birds and mammals. This includes humans, where hepatitis B viruses (HBVs) are prevalent pathogens in considerable parts of the global population. Recently, endogenized sequences of HBVs (eHBVs) have been discovered in bird genomes where they constitute direct evidence for the coexistence of these viruses and their hosts from the late Mesozoic until present. Nevertheless, virtually nothing is known about the ancient host range of this virus family in other animals. Here we report the first eHBVs from crocodilian, snake, and turtle genomes, including a turtle eHBV that endogenized >207 million years ago. This genomic “fossil” is >125 million years older than the oldest avian eHBV and provides the first direct evidence that Hepadnaviridae already existed during the Early Mesozoic. This implies that the Mesozoic fossil record of HBV infection spans three of the five major groups of land vertebrates, namely birds, crocodilians, and turtles. We show that the deep phylogenetic relationships of HBVs are largely congruent with the deep phylogeny of their amniote hosts, which suggests an ancient amniote–HBV coexistence and codivergence, at least since the Early Mesozoic. Notably, the organization of overlapping genes as well as the structure of elements involved in viral replication has remained highly conserved among HBVs along that time span, except for the presence of the X gene. We provide multiple lines of evidence that the tumor-promoting X protein of mammalian HBVs lacks a homolog in all other hepadnaviruses and propose a novel scenario for the emergence of X via segmental duplication and overprinting of pre-existing reading frames in the ancestor of mammalian HBVs. Our study reveals an unforeseen host range of prehistoric HBVs and provides novel insights into the genome evolution of hepadnaviruses throughout their long-lasting association with amniote hosts.  相似文献   

4.
5.
6.
We present a survey for non-coding RNAs and other structured RNA motifs in the genomes of Caenorhabditis elegans and Caenorhabditis briggsae using the RNAz program. This approach explicitly evaluates comparative sequence information to detect stabilizing selection acting on RNA secondary structure. We detect 3,672 structured RNA motifs, of which only 678 are known non-translated RNAs (ncRNAs) or clear homologs of known C. elegans ncRNAs. Most of these signals are located in introns or at a distance from known protein-coding genes. With an estimated false positive rate of about 50% and a sensitivity on the order of 50%, we estimate that the nematode genomes contain between 3,000 and 4,000 RNAs with evolutionary conserved secondary structures. Only a small fraction of these belongs to the known RNA classes, including tRNAs, snoRNAs, snRNAs, or microRNAs. A relatively small class of ncRNA candidates is associated with previously observed RNA-specific upstream elements.  相似文献   

7.
8.
9.
10.
Slack KE  Janke A  Penny D  Arnason U 《Gene》2003,302(1-2):43-52
We report complete mitochondrial (mt) genomes for a penguin (little blue, Eudyptula minor) and a goose (greater white-fronted, Anser albifrons). A revised annotation of avian and reptile mt genomes has been carried out, which improves consistency of labeling gene start and stop positions. In conjunction with this, a summary of mt gene features is presented and a number of conserved patterns and interesting differences identified. The protein-coding genes from the two new genomes were analysed together with those from 17 other birds plus outgroup (reptile) taxa. The unrooted amino acid tree from 19 avian genomes was locally stable with many high bootstrap values using several maximum likelihood methods. In particular, Anseriformes (goose and duck) grouped strongly with Galliformes (chicken) to form Gallianseres, while the penguin paired firmly with the stork. The position where the outgroup joined the avian tree varied with the combination of outgroup taxa used. The three best supported positions of the root were passerine, but the traditional rooting position between paleognaths and neognaths could not be excluded.  相似文献   

11.

Background

Only a small fraction of the mosquito species of the genus Anopheles are able to transmit malaria, one of the biggest killer diseases of poverty, which is mostly prevalent in the tropics. This diversity has genetic, yet unknown, causes. In a further attempt to contribute to the elucidation of these variances, the international “Anopheles Genomes Cluster Consortium” project (a.k.a. “16 Anopheles genomes project”) was established, aiming at a comprehensive genomic analysis of several anopheline species, most of which are malaria vectors. In the frame of the international consortium carrying out this project our team studied the genes encoding families of non-coding RNAs (ncRNAs), concentrating on four classes: microRNA (miRNA), ribosomal RNA (rRNA), small nuclear RNA (snRNA), and in particular small nucleolar RNA (snoRNA) and, finally, transfer RNA (tRNA).

Results

Our analysis was carried out using, exclusively, computational approaches, and evaluating both the primary NGS reads as well as the respective genome assemblies produced by the consortium and stored in VectorBase; moreover, the results of RNAseq surveys in cases in which these were available and meaningful were also accessed in order to obtain supplementary data, as were “pre-genomic era” sequence data stored in nucleic acid databases. The investigation included the identification and analysis, in most species studied, of ncRNA genes belonging to several families, as well as the analysis of the evolutionary relations of some of those genes in cross-comparisons to other members of the genus Anopheles.

Conclusions

Our study led to the identification of members of these gene families in the majority of twenty different anopheline taxa. A set of tools for the study of the evolution and molecular biology of important disease vectors has, thus, been obtained.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1038) contains supplementary material, which is available to authorized users.  相似文献   

12.
Large-scale analyses of protein complexes have recently become available for Escherichia coli and Mycoplasma pneumoniae, yielding 443 and 116 heteromultimeric soluble protein complexes, respectively. We have coupled the results of these mass spectrometry-characterized protein complexes with the 285 “gold standard” protein complexes identified by EcoCyc. A comparison with databases of gene orthology, conservation, and essentiality identified proteins conserved or lost in complexes of other species. For instance, of 285 “gold standard” protein complexes in E. coli, less than 10% are fully conserved among a set of 7 distantly-related bacterial “model” species. Complex conservation follows one of three models: well-conserved complexes, complexes with a conserved core, and complexes with partial conservation but no conserved core. Expanding the comparison to 894 distinct bacterial genomes illustrates fractional conservation and the limits of co-conservation among components of protein complexes: just 14 out of 285 model protein complexes are perfectly conserved across 95% of the genomes used, yet we predict more than 180 may be partially conserved across at least half of the genomes. No clear relationship between gene essentiality and protein complex conservation is observed, as even poorly conserved complexes contain a significant number of essential proteins. Finally, we identify 183 complexes containing well-conserved components and uncharacterized proteins which will be interesting targets for future experimental studies.  相似文献   

13.
Many bacteria encode an ortholog of the Ro60 autoantigen, a ring-shaped protein that is bound in animal cells to noncoding RNAs (ncRNAs) called Y RNAs. Studies in Deinococcus radiodurans revealed that Y RNA tethers Ro60 to polynucleotide phosphorylase, specializing this exoribonuclease for structured RNA degradation. Although Ro60 orthologs are present in a wide range of bacteria, Y RNAs have been detected in only two species, making it unclear whether these ncRNAs are common Ro60 partners in bacteria. In this study, we report that likely Y RNAs are encoded near Ro60 in >250 bacterial and phage species. By comparing conserved features, we discovered that at least one Y RNA in each species contains a domain resembling tRNA. We show that these RNAs contain nucleotide modifications characteristic of tRNA and are substrates for several enzymes that recognize tRNAs. Our studies confirm the importance of Y RNAs in bacterial physiology and identify a new class of ncRNAs that mimic tRNA.  相似文献   

14.
Avian genomes are of interest because the rapid metabolic rate associated with powered flight requires small cells which constrain genome size. Consequently, flying birds tend to have small genomes relative to other vertebrates such as mammals. It thus stands to reason that flying birds should have smaller genomes than ground-dwelling birds with lower metabolic rates. Small genomes could be condensed but uncompromised in a number of ways, including smaller intergenic intervals, shorter introns, and/or a reduced transposable element (TE) complement. We evaluated genome size in light of the orthologous TE complement among 41 flying (FY) and seven ground-dwelling (GD) bird species to determine if a preponderance of deletions in orthologous TEs might explain the compact genomes of flying birds with high metabolic rates. We measured, across multiple loci in all 48 species, the lengths of 50 contemporary orthologous chicken repeat 1 (CR1, a non-LTR retrotransposon) copies relative to inferred ancestral CR1 sequences. We found genome sizes in GD birds were not different than those in FY birds, but the mean lengths of orthologous CR1 loci were significantly shorter in FY birds than in GD birds. Moreover, we observed a negative correlation between basal metabolic rate and length of orthologous CR1 loci. Finally, we observed positive correlations between body mass and both genome sizes as well as length of orthologous CR1 loci, which we expected given that body mass correlates negatively with metabolic rates. Our results support the contention that metabolism helps shape the avian TE complement and thus indirectly contributes to the compact genomes of birds.  相似文献   

15.
Increasingly complex bioinformatic analysis is necessitated by the plethora of sequence information currently available. A total of 21 poxvirus genomes have now been completely sequenced and annotated, and many more genomes will be available in the next few years. First, we describe the creation of a database of continuously corrected and updated genome sequences and an easy-to-use and extremely powerful suite of software tools for the analysis of genomes, genes, and proteins. These tools are available free to all researchers and, in most cases, alleviate the need for using multiple Internet sites for analysis. Further, we describe the use of these programs to identify conserved families of genes (poxvirus orthologous clusters) and have named the software suite POCs, which is available at www.poxvirus.org. Using POCs, we have identified a set of 49 absolutely conserved gene families-those which are conserved between the highly diverged families of insect-infecting entomopoxviruses and vertebrate-infecting chordopoxviruses. An additional set of 41 gene families conserved in chordopoxviruses was also identified. Thus, 90 genes are completely conserved in chordopoxviruses and comprise the minimum essential genome, and these will make excellent drug, antibody, vaccine, and detection targets. Finally, we describe the use of these tools to identify necessary annotation and sequencing updates in poxvirus genomes. For example, using POCs, we identified 19 genes that were widely conserved in poxviruses but missing from the vaccinia virus strain Tian Tan 1998 GenBank file. We have reannotated and resequenced fragments of this genome and verified that these genes are conserved in Tian Tan. The results for poxvirus genes and genomes are discussed in light of evolutionary processes.  相似文献   

16.
17.
New microbial genomes are sequenced at a high pace, allowing insight into the genetics of not only cultured microbes, but a wide range of metagenomic collections such as the human microbiome. To understand the deluge of genomic data we face, computational approaches for gene functional annotation are invaluable. We introduce a novel model for computational annotation that refines two established concepts: annotation based on homology and annotation based on phyletic profiling. The phyletic profiling-based model that includes both inferred orthologs and paralogs—homologs separated by a speciation and a duplication event, respectively—provides more annotations at the same average Precision than the model that includes only inferred orthologs. For experimental validation, we selected 38 poorly annotated Escherichia coli genes for which the model assigned one of three GO terms with high confidence: involvement in DNA repair, protein translation, or cell wall synthesis. Results of antibiotic stress survival assays on E. coli knockout mutants showed high agreement with our model''s estimates of accuracy: out of 38 predictions obtained at the reported Precision of 60%, we confirmed 25 predictions, indicating that our confidence estimates can be used to make informed decisions on experimental validation. Our work will contribute to making experimental validation of computational predictions more approachable, both in cost and time. Our predictions for 998 prokaryotic genomes include ∼400000 specific annotations with the estimated Precision of 90%, ∼19000 of which are highly specific—e.g. “penicillin binding,” “tRNA aminoacylation for protein translation,” or “pathogenesis”—and are freely available at http://gorbi.irb.hr/.  相似文献   

18.

Background

Mammalian genomes commonly harbor endogenous viral elements. Due to a lack of comparable genome-scale sequence data, far less is known about endogenous viral elements in avian species, even though their small genomes may enable important insights into the patterns and processes of endogenous viral element evolution.

Results

Through a systematic screening of the genomes of 48 species sampled across the avian phylogeny we reveal that birds harbor a limited number of endogenous viral elements compared to mammals, with only five viral families observed: Retroviridae, Hepadnaviridae, Bornaviridae, Circoviridae, and Parvoviridae. All nonretroviral endogenous viral elements are present at low copy numbers and in few species, with only endogenous hepadnaviruses widely distributed, although these have been purged in some cases. We also provide the first evidence for endogenous bornaviruses and circoviruses in avian genomes, although at very low copy numbers. A comparative analysis of vertebrate genomes revealed a simple linear relationship between endogenous viral element abundance and host genome size, such that the occurrence of endogenous viral elements in bird genomes is 6- to 13-fold less frequent than in mammals.

Conclusions

These results reveal that avian genomes harbor relatively small numbers of endogenous viruses, particularly those derived from RNA viruses, and hence are either less susceptible to viral invasions or purge them more effectively.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0539-3) contains supplementary material, which is available to authorized users.  相似文献   

19.
Presence/absence patterns of retroposon insertions at orthologous genomic loci constitute straightforward markers for phylogenetic or population genetic studies. In birds, the convenient identification and utility of these markers has so far been mainly restricted to the lineages leading to model birds (i.e., chicken and zebra finch). We present an easy-to-use, rapid, and cost-effective method for the experimental isolation of chicken repeat 1 (CR1) insertions from virtually any bird genome and potentially nonavian genomes. The application of our method to the little grebe genome yielded insertions belonging to new CR1 subfamilies that are scattered all across the phylogenetic tree of avian CR1s. Furthermore, presence/absence analysis of these insertions provides the first retroposon evidence grouping flamingos + grebes as Mirandornithes and several markers for all subsequent branching events within grebes (Podicipediformes). Five markers appear to be species-specific insertions, including the hitherto first evidence in birds for biallelic CR1 insertions that could be useful in future population genetic studies.  相似文献   

20.
Adaptive sequential behavior is a hallmark of human cognition. In particular, humans can learn to produce precise spatiotemporal sequences given a certain context. For instance, musicians can not only reproduce learned action sequences in a context-dependent manner, they can also quickly and flexibly reapply them in any desired tempo or rhythm without overwriting previous learning. Existing neural network models fail to account for these properties. We argue that this limitation emerges from the fact that sequence information (i.e., the position of the action) and timing (i.e., the moment of response execution) are typically stored in the same neural network weights. Here, we augment a biologically plausible recurrent neural network of cortical dynamics to include a basal ganglia-thalamic module which uses reinforcement learning to dynamically modulate action. This “associative cluster-dependent chain” (ACDC) model modularly stores sequence and timing information in distinct loci of the network. This feature increases computational power and allows ACDC to display a wide range of temporal properties (e.g., multiple sequences, temporal shifting, rescaling, and compositionality), while still accounting for several behavioral and neurophysiological empirical observations. Finally, we apply this ACDC network to show how it can learn the famous “Thunderstruck” song intro and then flexibly play it in a “bossa nova” rhythm without further training.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号